I'm really not sure, but the book from Toole until page 15 already helped me, I believe. If I understand it correctly, recording an orchestra was more doable with microphones being not in the position of the listener in the hall, and not two at the ears, but near the instruments. Thinking about this I must agree. So that's a (insufficient) way recordings have been done a lot, because it had to be. (?)
On the other hand (I guess) artistic production would include using effects that sound pleasant with presentation systems that adds no reverbration, e.g. HD 600 or speakers in a treated room (?).
So the imperfect orchestra recordings are to dry, the artistic production may be (too) wet.
And it could be the other way around: A live recording may include enough spacial information, and the artistic production on the other hand might be made so dry that is works almost for exclusively for speakers. "The magic key" - the dry and loud drums fatigue my ears very quickly, it's so dry and 'impulsive'. Better listen to that song on speakers.
I mean, somehow I think I understand why it's hard to agree on anything, because it depends. But I need to go on reading that book. In a video he talks about different frequencies taking their inherent place around the head, from inside to outside. Problably makes it even more complicated. Probably covered in the book.
Sound localization is another mind bending topic you might enjoy looking into. Here is wiki as an appetizer:
https://en.wikipedia.org/wiki/Sound_localization
not short summary: A great deal of how we locate a sound source has to do with interaural time difference(ITD) and interaural level difference(ILD). Each ear receive the sound at a different moment in a different way, based on where we're looking, our own head, ears(and shoulders...) that will obstruct or deflect the sound differently depending on position and frequency. So each ear will end up with a different FR along with the different timing. with experience the brain gets pretty good at locating sound thanks to those cues.
-With a real sound coming from almost a single spot, this is mostly how it goes.
-With speakers... well, a mono sound comes equally from 2 spots, which is not natural at all. Relying on ILD and ITD system to place instruments on the album would make a mess. So instead, most of the job is done with simple panning (making one speaker louder with an otherwise identical signal, to position a given instrument). It's ILD simplified to the max. An easy cue for the brain and it fools us pretty well within the 60° angle between speakers. We can count ourselves lucky on that one, because something pretty wrong, turns out to work well. More importantly, most people will place sounds pretty much the same way, which is good.
Other tools are used to make us feel stuff, like boosting a certain frequency to push the center further away. But I’m no mixing or mastering engineer so I won't pretend to know or understand all that. It's not just psychoacoustic, it's psychoacoustic applied to speakers playback.
-With headphones, the "speakers" are on the ear at about 90° on each side. Obviously, basic panning gets spread over 180° instead of 60°. And because of that head sandwich, it's likely that panning will place instruments not around us, but on a line directly between the drivers(so called lateralization effect when listening to usual stereo albums with headphones).
That part varies greatly from listener to listener and what makes predictions difficult is that we mostly know about normal hearing. Here the brain has to invent an interpretation from cues that make little sense. Someone's brain might notice that the sound turns perfectly with the head and conclude that the source has to be inside the head. Some other brain might get that sounds are coming from the drivers and ”hear” most sounds located around them.
Another brain might just have decided from experience that instruments are at some distance in front, and place them there no matter how many contradicting audio cues come its way.
A lot is beyond the control of headphones, or sound engineers making a typical stereo track when it comes to headphones.
But wait there's more!
Our experience of locating a sound in general, involves how the sound is altered by our own body(ILD ITD... how the sound changes depending on direction is defined by HRTF head related transfer function, and is pretty unique to you, as is your body). Headphones bypass some of that by emitting sound right on the ears, so sounds don't get to be altered by your head and torso or where you're looking at. The ears do alter the sound but in the way it would for a transducer stuck on it, so that's not great.
The brain missing some cues will still try to find them withing the sound. But of course they're not there. So we can rapidly start to mistake something for a localization cue. Like taking the frequency response for elevation cues, and finding ourselves thinking that this instrument, EQed that way on the track, altered by the headphone on our head, plus a certain amount of panning, feels like the piano is about 40° up and 80° on the right when no cue of elevation was ever contained in the stereo track. Another headphone with a different FR might let you place that same piano somewhere else. Which is in part where
@Hifiearspeakers is right to disagree with
@bigshot on headphone having no control to change imaging. They have very little control, but they will alter the perceived presentation in some ways.
Maybe not seeing a sound source, stops you from imagining that there can be one in front of you(relatively rare but documented).
Many things become quite uncertain at this point. And that is the reality of unprocessed stereo audio on headphones. It's not so much that we don't understand how to do it well(but for that we need custom measurements and processing). Until then, once many things go wrong, we can't always predict what plan B will be imagined by the brain as it tries to make sense of something that does not make any.
Different headphone listeners might get from an almost identical experience, to vastly different ones. And back to some idea of space, distance, or even elevation, how can we tell when one presentation is better? Beyond a completely personal opinion, it can be tricky. Should we assume that hearing a sound further away is always a sign on better imaging? Was that sound intended to very far away? If we reference a speaker playback(we should at least do that for old albums), then anything going up or passed 30° on the side is an aberration.
I hope I conveyed the notion of complexity without losing you entirely. Not sure I would read such a long and disorganized post...
The idea of recording at the position of the listener is so that we can get the sound the way it is from that position. In principle it seems like a very good idea. An even better one is to record sound at the ears of the listener. If that was done well and the track was more or less tuned for the headphone that would be used, we'd come pretty close to the sound like we had that day. That's ideal binaural recording including your HRTF. The problem is, well everything I just explained. If you do that super well for one listener, it probably won't be that good for the next listener because the recorded sound will have been altered by listener A's head and ears. Listener B's brain knows nothing of how to interpret those changes, because it just spent its entire life using listener B's head and ears^_^.
as a result binaural albums makes some people delighted, and others like me tend to go "meh!", when listening to them.
Recording near the instruments allows to capture them cleanly with better SNR. So it's usually preferred. Very few sound engineers try to capture a position for the instruments anyway. They usually make it themselves later on, artificially. Even if it's to end up recreating something like the original placement. All the Atmos stuff were supposed to change that, and motivate people to capture ”3d sounds”. but so far it mostly brought cleverer ways to create whatever positioning as a post process from good old mono tracks.