I thought my first blog post should be non technical, and frankly the only non technical audio related subject I could think of that people may find interesting was listening tests - but I guess this is pretty fundamental subject for audio. After all, it's the main thing that separates the extreme objectivists (you don't need to listen it's all nonsense) from the extreme subjectivists (you don't need to measure it's all nonsense) argument. Its at the heart of the major discourse on Head-Fi - a poster says product xyz sounds great,another politely states your talking twaddle - of course they are (hopefully) arguing on the sound quality based upon their own listening tests, biases and preferences. Indeed, I often read a post about a device I know well, and can't relate the posters comments with what I think about the device. Sometimes this is simply different tastes, and I can't and won't argue with that - whatever let's you as an individual enjoy music is perfect for you, and if its different for me then that's fine - vive la différence. But sometimes the poster simply gets it wrong, because they do not have the mental tools to accurately assess sound quality. Over the many years I have developed listening tests that tries to objectively and accurately assess sound quality. These tests are by no means perfect, and I admit that listening tests are very hard and extremely easy to get wrong - that's why it's important to try to get more accurate results, as its very easy to go down the wrong path.
Another problem is sensitivity - some people hear massive changes, some can barely discriminate anything at all. Fortunately, I consider myself in the former camp, but I don't know how much is innate
or through training (I have done lots of tests in my time...) Certainly, having an objective methodology does help, even if it's only about being able to more accurately describe things.
I also would like to talk about how listening tests are actually used in my design methodology, and what it is I am trying to accomplish when designing.
We all make assumptions; otherwise we couldn't get on with life, let alone get anything done. But the key for designing is to test and evaluate one's assumptions, and verify that the assumptions make sense. The key is about thinking where the assumptions lie and then evaluating whether the assumption is valid. For example, I am assuming you are an audiophile or somebody that is interested in audio. Valid? Yes, otherwise why would you be on Head-Fi. I am also assuming you know who Rob Watts is. Valid? Not really, you may be new to Head-Fi or know nothing about Chord. So quick summary - I am an audiophile, a designer of analogue electronics (started designing my own gear in 1980) then in 1989 started designing DAC's. Frustrated by the performance of DAC chips, in 1994 I acquired digital design skills and started designing my own DAC's creating pulse array DAC technology using a new device called FPGA's. These devices allowed one to make your own digital design by programming an FPGA. I then became an independent design consultant, and started working with Chord, and the first product was the DAC 64. This was unique, in that it was the first long tap length FIR filter (the WTA filter). In 2001 I started working with silicon companies, and designed a number of silicon chips for audio. Most of my activity was in creating IP and patenting it, then selling the patents. Today, I only work on high end audio, having stopped working with silicon last year.
From my beginnings as an audiophile, I was intrigued about the physiology of hearing and spent a lot of time reading up about how hearing works. In particular, I was intrigued about how hearing as a sensory perception is constructed - we take our hearing for granted, but there is some amazing processing going on.
The invention of the WTA filter with the DAC 64 nicely exposes the conventional engineering assumption - that the length of an FIR filter (an FIR filter is used in DAC's to convert the sampled data back into a continuous signal) does not matter, that all we need is a suitable frequency response. But if one looks into the maths of sampling theory, then it is clear that to perfectly
recover the bandwidth limited signal in the ADC then an infinite tap length FIR filter is needed. It is also obvious to me that if you had a small tap length filter, then the errors would present themselves as an error in the timing of transients. Now a transient is when the signal suddenly changes, and from my physiology of hearing studies transients are a vital perception cue, being involved in lateral sound-stage positioning, timbre, pitch perception and clearly with the perception of the starting and stopping of notes. So how are we to evaluate the changes in perception with tap length? Only by carefully structured listening tests.
Another area where there are assumptions being made is designing using psycho-acoustic thresholds. The rational for this is simple. From studies using sine waves, we know what the human ear can detect in terms of the limits of hearing perception. So if we make any distortion or error smaller than the ear's ability to resolve this (hear it) then it is pointless in making it any better, as the ear can't detect it. On the face of it, this seems perfectly reasonable and sensible, and is the way that most products are designed. Do you see the assumption behind this?
The assumption is that the brain is working at the same resolution as our ears - but science has no understanding of how the brain decodes and processes the data from our ears. Hearing is not about how well our ear's work, but is much more about the brain processing the data. What the brain manages to achieve is remarkable and we take it for granted. My son is learning to play the guitar, and every so often the school orchestra gives a concert. He was playing the guitar, along with some violins, piano, and a glockenspiel. We were in a small hall; the piano was 30 feet away, violins and guitar 35 feet, glockenspiel 40 feet. Shut my eyes and you perceive the instruments as separate entities, with extremely accurate placement - I guess the depth resolution is about the order of a foot. How does the brain separate individual sounds out? How does it calculate placement to such levels of accuracy? Psycho-acoustics does not have a depth of image test; it does not have a separation of instruments test; and science has no understanding of how this processing is done. So we are existing with enormous levels of ignorance, thus it is dangerous to assume that the brain merely works at the same resolution as the ears.
I like to think of the resolution problem as the 16 bit 44.1k standard - the ear performance is pretty much the same as CD - 96 dB dynamic range, similar bandwidth. But with CD you can encode information that is much smaller than the 16 bit quantised level. Take a look at this FFT where we have a -144 dB signal encoded with 16 bit:
So here we have a -144 dB signal with 16 bit data - the signal is 256 times smaller than the 16 bit resolution. So even though each quantised level is only at -96 dB, using an FFT it's possible to see the -144 dB signal. Now the brain probably uses correlation routines to separate sounds out - and the thing about correlation routines is that one can resolve signals that are well below the resolution of the system. So it is possible that small errors - for which the ears can't resolve on its own - become much more important when they interfere with the brains processing of the ear data. This is my explanation for why I have often reliably heard errors that are well below the threshold of hearing but nonetheless become audibly significant - because these errors interfere with the brains processing of ear data - a process of which science is ignorant off.
Of course, the idea that immeasurably small things can have a difference to sound quality won't be news to the majority of Head-fiers - you only need to listen to the big changes that interconnect cables can make to realize that. But given that listening tests are needed, that does not mean that objectivists are wrong about the problems of listening tests.
Difficulties in listening
Placebo - convince yourself that your system sounds a lot better - and invariably it will. So your frame of mind is very important, so it's essential that when doing listening tests you are a neutral observer, with no expectations. This is not as easy as it sounds, but practice and forcing your mental and emotional state to be neutral helps.
Minimize variables. When lots of things change, then it becomes more difficult to make accurate assessments. So when I do a specific listening test I try to make sure only one variable is being changed.
Don't listen to your wallet. Many people expect a more expensive product to be naturally better - ignore it - the correlation between price and performance is tenuous.
Don't listen to the brand. Just because it is a brand with a cult following means nothing. Ignore what it looks like too.
Do blind listening tests. If you are either unsure about your assessment, or want confirmation then do a single blind listening test where the other listener is told to listen to A or B. Don't leak expectation, or ask for value judgements - just ask them to describe the sound without them knowing what is A or B.
Remember your abilities change. Being tired makes a big difference to accuracy and sensitivity - more than 4 hours of structured AB listening tests means I lose the desire to live. Listening in unusual circumstances reduces sensitivity by an order of magnitude - double blind testing, where stress is put on listeners can degrade sensitivity by two orders of magnitude. Be careful about making judgements at shows for example - you may get very different results listening in your own home alone. Having a cold can make surprising differences - and migraines a few days earlier can radically change your perception of sound.
Be aware - evaluating sound quality is not easy, and its easy to fall into a trap of tunnel vision of maximizing performance in one area, and ignoring degradations in other areas. Also, its easy to get confused by distortion - noise floor modulation, can give false impressions of more detail resolution. A bright sound can easily be confused with more details - distortion can add artificial bloom and weight to the sound. Its easy to think you are hearing better sound as it sounds more "impressive" but a sound that actually degrades the ability to enjoy music. Remember - your lizard brain - the part that performs the subconscious processing of sound, the parts that enjoy music emotionally - that can't be fooled by an "impressive" sound quality. Listen to your lizard brain - I will be telling how shortly.
Don't be afraid of hearing no reliable difference at all. Indeed, my listening tests are at their best when I can hear no change when adjusting a variable - it means I have hit the bottom of the barrel in terms of perception of a particular distortion or error, and this is actually what I want to accomplish.
Don't listen with gross errors. This is perhaps only appropriate for a design process - but it is pointless doing listening tests when there are measurable problems. My rule of thumb is if I can measure it, and it is signal dependent error, then its audible. You must get the design functioning correctly and fully tested before doing listening tests.
Although I have emphasised the down side to listening, I find it remarkably easy to hear big changes from very small things - the ear brain is amazingly sensitive system. I once had an issue with a company accepting that these things made a difference, so I conducted a listening test with two "perfect" noise shapers - one at 180 dB performance, one at 200 dB performance. An non audiophile engineer was in the listening test, and afterwards he said that what really surprised him was not that he could hear a difference between two "perfect" noise shapers - but how easy it was to hear the differences.
AB listening tests
Now to the meat of this blog, the actual listening tests. Here you listen for a set piece of music, and listen for 20 to 30 seconds, then go back and forth until you can assess the actual performance. The factors to observe or measure are:
1. Instrument separation and focus.
With instrument separation you are observing how separate the instruments sound. When this is poor, you get a loudest instrument phenomena: the loudest instrument constantly attracts your attention away from quieter instruments. When instrument separation gets better, then you can easily follow very quiet instruments in the presence of a number of much louder instruments. When instrument separation gets to be first rate then you start to notice individual instruments sounding much more powerful, tangible and real. Very few recordings and systems actually have a natural sense of instrument power - only very rarely do you get the illusion of a powerful instrument completely separate from the rest of the recording, in the way that un-amplified music can do. Poor instrument separation is often caused by inter-modulation distortion, particularly low frequency.
2. Detail resolution.
Detail resolution is fairly obvious - you hear small details that you have not heard before- such as tiny background sounds or the ambient decay - and this is one measure of transparency. But its asymmetric - by this I mean you make an improvement, hear details you have not heard before, then go back, and yes you can just about make it out - once heard its easy to spot again with poorer detail resolution. Additionally, its possible to get the impression of more detail resolution through noise floor modulation - a brighter, etched sound quality can falsely give the appearance of more detail; indeed, it is a question of balance too; details should be naturally there, not suppressed or enhanced. This illustrates how difficult it is to get right. Small signal non-linearity is normally the culprit for poor detail resolution.
3. Inner detail.
Inner detail is the detail you get that is closely associated with an instrument - its the sound of the bow itself on a violin for example, or the subtle details that go into how a key on a piano is played. Technically, its caused by small signal non linearity and poor time domain performance - improving the accuracy of transient timing improves inner detail.
4. Sound-stage depth.
A favorite of mine, as I am a depth freak. Last autumn we were on holiday in Northern Spain and we visited Montserrat monastery. At 1pm the Choir sing in the basilica, and we were fortunate enough to hear them. Sitting about 150 feet away, shutting ones eyes, and the impression of depth is amazing - and vastly better than any audio system. Why is the impression of depth so poor? Technically, small signal non-linearity upsets the impression of depth - but the amazing thing is that ridiculously small errors can destroy the brains ability to perceive depth. Indeed, I am of the opinion that any small signal inaccuracy, no matter how small, degrades the impression of depth.
5. Sound-stage placement focus.
Fairly obvious - the more sharply focused the image the better. But - when sound-stage placement gets more accurately focused, the perception of width will shrink, as a blurred image creates an artificial impression of more width. Small signal non-linearity, transient timing and phase linearity contribute to this.
This is where bright instruments simultaneously sound bright together with rich and dark instruments - the rich and smooth tones of a sax should be present with the bright and sharp sound of a trumpet. Normally, variation in timbre is suppressed, so everything tends to sound the same. Noise floor modulation is a factor - adding hardness, grain or brightness, and the accuracy of timing of transients makes a big difference.
7. Starting and stopping of notes.
This is the ability to hear the starting of a note and its about the accuracy of transient timing. Any uncertainty in timing will soften edges, making it difficult to perceive the initial crack from a woodblock, or all the keys being played on a piano. Unfortunately, its possible to get confused by this, as a non linear timing error manifests itself as a softness to transients - because the brain can't make sense of the transient so hence does not perceive it - but in hard sounding systems, a softness to transients makes it sound overall more balanced, even though one is preferring a distortion. Of course, one has to make progress by solving the hardness problem and solve the timing problem so that one ends up with both a smooth sound but with sharp and fast transients - when the music needs it.
8. Pitch and rhythm.
Being able to follow the tune and easily follow the rhythm - in particular, listen to the bass, say a double bass. How easy is it to follow the tune? On rhythms its about how easy it is to hear it - but again, be careful, as it is possible to "enhance" rhythms - slew rate related noise modulation can do this. In that case, things sound fast and tight all the time, even when they are supposed to be soft and slow.
Clearly this links in with timbre, but here we are talking about overall refinement - things sounding smooth and natural, or hard and bright? Clearly, frequency response plays a major role with transducers, not so with electronics. Also, the addition of low frequency (LF) 2nd harmonic will give a false impression of a soft warm bass. Often I see designers balancing a fundamentally hard sound with the addition of LF second harmonic in an attempt to reduce the impact of the innate hardness - but this is the wrong approach, as then everything always sounds soft, even when its supposed to sound fast and sharp. In electronics, assuming THD is low, then noise floor modulation is a key driver into making things sound harder - negligible levels of noise floor modulation will brighten up the sound. Another very important aspect is dynamics and refinement - does the sound change as it gets louder - some very well regarded HP actually harden up as the volume increases - and harden up IMO in a totally unacceptable way.
"You know nothing Jon Snow"
My favourite quote from Game of Thrones - but it illustrates the uncertainty we have with listening tests, particularly if done in isolation without solid measurements.
We are listening with recordings for which we do not know the original performance, the recording acoustic environment, nor do we know the equipment it was recorded with, the mastering chain, nor the source, DAC, amplifier, HP or loudspeaker performance in isolation. We are listening to a final result through lots of unknown unknowns. I can remember once hearing an original Beatles master tape played on the actual tape machine it was recorded with, using the actual headphones they used. It sounded absolutely awful. But then I was also lucky enough to hear Doug Sax mastering at the mastering labs - the equipment looked awful - corroding veroboard tracks on hand made gear - but it sounded stunning. So we are dealing with considerable uncertainty when doing listening tests. Its even more of a problem when designing products - how do you know that you are not merely optimizing to suit the sound of the rest of your system rather than making fundamental improvements to transparency? How can you be certain that a perceived softness in bass for example, is due to reduction in aberrations (more transparent) or increase in aberrations (less transparent).
Fortunately its possible to clarify or to be more sure with using two methods. First one is variation - all of the AB listening tests are really about variation - and the more variation we have, the more transparent the system is. So going back to the soft bass - if bass always sounds soft, then we are hearing a degradation. If it sounds softer, and more natural, but when the occasion allows sounds fast and tight too - then we have actually made an improvement in transparency. Again, its a question of being careful, and actually asking the question, is the system more variable. If it is more variable, its more transparent. So why bother with transparency and just make a nice sound? The reason I care so much about making progress to transparency is simply by listening to an un-amplified orchestra in a good acoustic makes one realize how bad reproduced audio is. Now I think I have made some big progress over the past few years - but there is still a way to go - particularly with depth, timbre variations and instrument power. This is why my focus is with the pro ADC project, as I will then be able to go from microphone to headphone/loudspeaker directly - my goal is being able to actually hear realistic depth perception exactly like real life.
The second method of doing a sanity check on listening tests is with musicality...
I ought to define what musicality actually is first, as people have different definitions. Some people think of it as things sounding pleasant or nice. That's not my idea of it - to me its about how emotional or involving the reproduced sound actually is. To me, this is the goal of audio systems - to enjoy music on. And plenty of people go on ad nauseam about this, so I am sorry to add to this. Merely talking about musicality does not mean you can actually deliver it - and it is something very personal.
But it is important, and to test for musicality you observe how emotional and engaging the music actually is. The benefits of this approach is that your lizard brain that decodes and processes audio, and releases endorphins and makes your spine tingle, doesn't actually care whether you think it actually sounds better or not. And since enjoying music is what this hobby is about, then it is important to measure this. To do this, you can't do an AB test, you have to live with it, and record how emotionally engaging it actually is. That said, although its good as a test to check you are going in the right direction, its not effective for small changes, and it can only be based over many days or weeks with different music.
So listen to your lizard brain! I hope you got something useful from this.