This might not be the correct use of terms. Would be nice to get things right and in agreement. Please fill me in, if I am wrong.
I don't know what the correct terminology is, but from my experience, the following makes sense to me. (I start of by talking speaker sound)
Sound stage
Some systems seems to disappear. There is no sounds "coming from the speaker" and if I close my eyes, it is actually hard to pinpoint where the speaker is. For such systems, the far right, is about the straight line going from my head to the speaker, just continuing behind the speaker. Some of these systems, are not as great as to draw the dept.
By dept, I mean the ability to place the instruments at the axis running on lines that intersects my head. Or the area behind the speakers. Some systems are great at listening to orchestral music, as instruments appear both back and forth in dept.
A combination of reproduction in both dept and width is rare. It also is strongly dependent on the room, which the speakers are placed, and how everything is placed. In particular, in the extended triangle made up by my head, and the two speakers.
Some speaker makers, tries actively to draw the sound to the front of the stage. Like Monitor Audio. In that case, the soundstage could be described as close an personal. Meaning nothing else, than that the speaker is tuned to draw the music closer to the foreground. Foreground is the line running between the speakers.
Other speakers, like the ones I am using, Snell, do it the old fashion way. They like to draw things more in dept. That is just awesome for big stage music, like big choirs. But not so great for a single artist with a guitar.
I do not know if I hit this correctly, but as i see it, width and dept, as to where instruments possibly might be positioned, is important for sound stage. As would be the way the sound stage is drawn, like front or rear heavy. That sort of makes sense to me. If I got this wrong, please let me know.
Imaging
This term is quite new to me, as I would not use it by my native tongue. The way I see it, how well instruments and voices are placed in the stereo perspective, the positioning, is independent of the sound stage itself. Well, to some degree. By that, I mean that even if reproduction is wide and deep, instruments might not be placed with spectacular accuracy. Where they are placed both by dept and width, might be a bit blurry. I am accustomed to call this perspective in my native tongue. My understanding is that this would be imaging.
I guess, that body, should fall into this as well? Let me explain. If a instrument is well reproduced, all its tones are placed correctly. In some recordings, the movement of the fingers (pressing the strings) on the guitar is placed to the right, while the fingers hitting the strings, slightly to the left. The sound of the guitar box, the wooden box, extends a bit more, as it should. That is like hyper excellent recording, and superb imaging. Also, the expected sounds are reproduced as expected, and in tune. If so, the guitar seems to be given body, and virtually exists.
If I were to use a term like imaging, that would make sense to me. I might be wrong though. Would be nice to nail this one.
Articulation
Articulation is related to imaging. In my experience, this is not abut placement in the stereo perspective. It is more about how well articulated the reproduction is of voices and instruments.
By that I mean tone and details. Not just that things are reproduced, but that it is produced with precision. This is best described by the horrific reproduction of percussion by mp3 compression. In particular cymbals. They are reproduced by mp3, but that is it. The finer details are completely lost by the conversion. It is washed out into a high pitched something, lacking almost any articulation.
Separation
To me, this is the ability to separate instruments and vocals. If I focus, I am able to track an instrument, and the easier it is, the better the separation. I can hear what it plays.
Then there is, again, false separation. Like if I lossy compress music, some instruments actually gets easier to follow. But that often comes at a cost, as other instruments simply just disappears. Filtering out other instruments, is not improved separation.
Musicality
I do not think this has been mentioned, but musicality is to me, the ability to draw me into the music, to engage me. Meaning what exactly?
To me, first and foremost, that everything is reproduced in harmony. By that I mean, that harmonies are heard. Like listening to a choir, the voices harmonizes. Or even for pop, the instruments blend as they should. They are reproduced correctly by tone. To me, when that is the case, I sort of forget everything else, as I am sensitive to that.
Also, nothing should be off. If you got great speed and attack in the highs, slow and dull base sounds off with that. The infamous hiss of the HD800 is another example.
Details
Details, is more simple to quantify. It is simply if sounds are audible reproduced. Accuracy has little to do with this. Some gear lifts all the lower level sounds, which is a safe bet to improve details. But not articulation. Not musicality. Not imaging. Just lifting the low level treble, usually brings a lot of details. Like the breath of the woman singing.
Great reproduction of details is often times accompanied by great dynamic range. By that I mean the ability to be articulate across the entire dynamic range. If you listen to Metal, the details remain. If you play classical music, articulation will be great for all instruments, both those playing loud, and those playing soft. But sounds will be reproduced at their correct sound level. There is a huge difference at that.
This is why, when people just throw out there that things are more detailed, that is not necessarily a good thing.
Clarity
This is a tricky term. A silly way to describe it, would be that the less haze in the the reproduction the better clearity. Like removing foam covering the speaker. Only it is bloody tricky to pinpoint in the reproduction, and when a lossy mp3 is clearer, things suddenly turn messy on me.
Clarity is the one thing that tricks me. My Note3 mobile phone sounds clear as a bell, but why?
Sound drawn closer to me, front heavy that is, typically sounds cleaner.
Also, a simpler reproduction, highlighting the main traits of the instrument, oftentimes sounds clearer to me. Its like this USM of photography, in which less sharp is sharper.
It is unclear to me, what makes up clarity, as by my senses.
In my experience, real clarity is best described, as the combination of other aspects, and the synergy of that. Sound stage, imaging, separation, articulation, and details. When they mix, the perceieved clarity is of a completely different nature. That is the best I can do.
Might sound silly, but I have learned not to trust my ears on "clarity". If I tune anything by my experience of that, I mess everything up.
To add to my confusion, digital noise masked my rig at one point. The sound stage was tilted far back there. Something was clearly off. Once removing a greater part of the digital noise, moved everything way more upfront. I had mistaken digital noise for sound stage and imaging. Embarrassing, to say the least.
Headphones
This is really easy. Headphones would give you a sound stage, as if you had the speakers up to your ears, as you do. Left to right passes through your head.
Dept is relatively small, as the distance between the speakers are small, and the distance between you and the line running between the speakers, is like nill.
The HD800 has the elements at a distance from the ear, resulting in a slightly longer path between them, and your ears. By default, that offers a wider sound stage. The axes between the cans, no longer passes through the middle of your head. Rather the front part of it. (So, I guess that means that I hear voices in my head then? Not sure if I like the sound of that.)
As for the rest of the terms, as I have described it here, headphones excel.
In fact, if anyone made user specific vector correction to the sound, and placed the sounds individually on the fly, headphones is the only current artifact that may reproduce 3D, in any direction. If the listener had a directional sensor on its head, sounds would remain positioned fixed relative to the listener, like one meter in front, slightly to the left. Even if the listener turns its head.
The limited sound stage of current tech, is due to how the music is recorded and reproduced. Left to right is oftentimes just the difference in sound level. For real acoustic recordings, you also got the time delay between the microphones, and environmental reflections. But that is not matched to the listeners ears.
The ear picks up at least the following:
- Level differences (as used in stereo recordings)
- Time difference between hitting the ears(accustic recordings)
- Directional movement by the source (as in an airplane moving toward or away from you: It sound differently)
Example: Most people can pinpoint a plane in the sky (well, they would point to a point where it was three second ago, if the plane was a 1000 meters away). At that distance, the sound level difference between the ears is tiny. Double the distance to the source, results in half the level. Sound level only decreases 3db from 500m to 1000m, resulting in hardly any difference between the ears. But the sound will hit the ears at slight time delta. That difference in time, is instrumental, in the human ability to pinpoint sounds.
Even more impressive, is that the speed of sound alters quite a bit by temperature, but that do no seem to affect the hearing much.
A plane moving towards you, has a different pitch, as its movement compresses the sound. You hear this in particular as the plane passes above you, as the sound alters at that point. I lived near an airport, so this became second nature to me. Same applies to cars.
The headphone is the only current artifact, that can possible reproduce all of this, as it by design, could reproduce the time delta. But the tech is not there yet.