Testing audiophile claims and myths
Jan 12, 2019 at 12:57 PM Post #12,046 of 17,336
SonyFan121,
I strongly suggest you take bigshot's test. I was convinced I heard a "night and day" difference between mp3 and FLAC. Then I got the ABX plugin for foobar2000 and tested myself. All that night-and-day difference disappeared, and I found myself essentially guessing. Expectation bias is a wonderfully powerful thing, and we're all subject to it.

Yep, I and many others have had the same experience. It's amazing how we can hear clear and consistent differences between gear, and they're consistent with what other people report, yet those differences seem to disappear when doing controlled blind testing.

Matching volumes is important to avoid something a bit louder seeming more 'dynamic' or whatever. Matching music segments is important to avoid applying perception to different raw material and getting different perceptual results for that reason alone (e.g., one segment has much different bass content than another). Minimizing switching time is important to reduce the effects of rapidly fading sensory memory. Blinding is important to reduce the effects of perceiving differences that we expect to perceive; but we can also not perceive differences because we expect that things will sound the same, and blinding doesn't solve that problem at all (i.e., if you expect things to sound the same, you may perceive it that way without really trying to find differences, and therefore just guess in the trials, so a null result is produced because it was expected).

I still think all listening tests are subject to problems related to 'trusting our ears' and 'trusting our memories', and results of one type of listening test may not necessarily generalize to other listening situations, but I'm inclined to think that controlled blind tests should be helpful in ruling out the possibility of large differences caused by expectation of large differences.
 
Jan 12, 2019 at 1:46 PM Post #12,047 of 17,336
Yep, I and many others have had the same experience. It's amazing how we can hear clear and consistent differences between gear, and they're consistent with what other people report, yet those differences seem to disappear when doing controlled blind testing.
Yes, something I understand more clearly than the people in this forum realise. The fact is, it would be ill-informed to believe that there is no difference in sound quality between a lossy low bitrate audio file and a uncompressed/lossless file. If we can't hear one, that doesn't mean it doesn't exist, technically there has to be..and we shouldn't convince ourselves otherwise.

As I said, based on experience;

It depends heavily on the quality of the equipment.
 
Last edited:
Jan 12, 2019 at 2:21 PM Post #12,048 of 17,336
The fact is, it would be ill-informed to believe that there is no difference in sound quality between a lossy low bitrate audio file and a uncompressed/lossless file. If we can't hear one, that doesn't mean it doesn't exist, technically there has to be..and we shouldn't convince ourselves otherwise.

We're talking here about audible differences. Everyone understands that a lossy file is smaller than a lossless one and contains fewer zeros and ones. But there comes a point with lossy where your ears can't detect a difference any more. That is called the threshold of audible transparency. For the purposes of listening to recorded music in the home, audible transparency is all you need. Any differences beyond that are inaudible by definition and inaudible sound doesn't matter. That's why we do listening tests to determine where our threshold lies. I've been looking for some time to find a person who can consistently detect a difference between high data rate lossy and lossless. I haven't found one yet, but I keep searching.

The interesting thing that I've learned is that audiophiles go into great detail on the technical specifications of equipment. They measure the distortion levels down to infinitesimal levels and discuss frequency response that extends up octaves beyond 20kHz. But they don't have much of a grasp of the way their ears hear and how those specs relate to audible sound. Listening tests are the way to understand that relationship. My sig file has links to a couple of great seminars from the Audie Engineering Society that link to actual sound files you can download and listen to. It's broken down into commonly cited specs so you can hear a range of different quality levels and see how the numbers relate to real world sound. The links to the downloads are in the description under the video.

Ears have limits and it's a waste of time and money to chase improvements beyond the range of audibility. There's usually too much in the audible range that needs fixing!

By the way, I uploaded your test file this morning. Have fun with the test.
 
Last edited:
Jan 12, 2019 at 2:56 PM Post #12,049 of 17,336
You bring up an excellent point - and one which many people seem to ignore.

Various sorts of blind tests can largely eliminate the effects of an expectation bias to hear a difference if there isn't any.
However, it's impossible to completely eliminate an expectation bias to NOT hear a difference.
It's quite possible that people become less likely to hear or report subtle differences if they don't expect a difference to be present.
There is also a widely known tendency for humans to respond to peer pressure when publicly reporting their experiences.

And there are even more interesting and subtle possibilities for error.
For example, we humans have a negative reaction to "failed expectations".
We tend to get frustrated when our expectations aren't met.
So, for example, someone who is expecting to hear "a big obvious difference", and fails to hear an obvious difference, may be less likely to notice a subtle difference.
(Because, after being frustrated at not hearing the obvious difference they expected to hear, they are less carefully focussed on noticing subtle differences.)

There are ways in which some of this COULD be tested statistically... if anyone was willing to bother.
Here's one suggestion for how to do so.
(In order to produce valid test results you would want a large number of test subjects to take the test.)
Test files could be made up with known flaws - perhaps different amounts of deliberately added noise or distortion.
The basic test procedure would be to run a bunch of trials to determine at what level each test subject could reliably detect and report the presence of the distortion.
HOWEVER, the test would be run multiple times, with different groups of test subjects, with each group subjected to a DIFFERENT EXPECTATION BIAS.
(Using some sort of pretext, perhaps by being told that something else was being tested, one group would EXPECT the files to be different,
one group would EXPECT them NOT to be different, and a third group would have no particular expectation either way - they would be told that some files might be different.)
It would be VERY interesting to see how the "ability to notice and report a difference" would differ between the neutral group and the two groups with "pre-loaded biases".)

There is also another sort of bias which needs to be accounted for - and which is often used to major advantage in group situations: peer pressure.
Put someone in a room full of people, and ask people to "raise their hands if they hear a difference".
As soon as a few people raise their hands, it creates a desire to "raise your hands and become part of the group".
This both biases people to raise their hands, even if they don't hear a difference, and actually creates a bias to WANT and EXPECT to hear a difference.
And, the exact converse of that, place someone in a room full of skeptics, most of whom don't raise their hands, and there is a bias NOT to "raise their hand and go against the group".
(Anyone who runs demonstrations knows how effective it is to place a few shills in the room to raise their hands at the appropriate time and "get the ball rolling".)

This effect is widely known... and described in many textbooks on the subject.... for example, Cialdini's text book on "Influence", which is course material in Harvard business school.

Both of these effects are well know... and both need to be accounted for.
The "group effect" can be accounted for by doing the tests in isolation.....
Where each person takes the test separately, and reports their results separately, and is NOT allowed to see other results until after the total is tallied.

Note how this is the exact OPPOSITE of running an online study where everyone gets to see a running total of the results their peers have already turned in.
When you do that you are introducing TWO distinct problems:
- you are introducing an EXPECTATION in each new subject to experience what the majority of previous subjects have already reported
- you are creating peer pressure to WANT to both experience and report results similar to what most others have already reported

I might also suggest an interesting way to test for that last sort of bias.... which is simply to create a phony bias and see how it affects the results.
The way to do that is relatively simple....
Create some sort of fair test and present it to three groups of test subjects; you could use BigShot's test of "which lossy compressed files are audible".
(The only requirement is that the range of differences is wide enough that it is unlikely to be "obvious to everyone".)
One group is told that "fifty people have already taken the test, and 92% of them heard an obvious difference"...
(You have now created both an expectation bias and a peer pressure bias in that group to expect and want to hear a difference.)
The other group is told "fifty people have already taken the test, and the resuts were statistically random"...
(You have now created both an expectation bias and a peer pressure bias in that group to expect and want to NOT hear a difference.)
The third group is told that they are the first ones to take the test - and they won't get to see the results tallied until their results are all turned in.
(This group is truly neutral in terms of bias.... except, of course, for any biases they may already have.)

If the results are significantly different - then you may infer that the differences were due to the initial bias.

Yep, I and many others have had the same experience. It's amazing how we can hear clear and consistent differences between gear, and they're consistent with what other people report, yet those differences seem to disappear when doing controlled blind testing.

Matching volumes is important to avoid something a bit louder seeming more 'dynamic' or whatever. Matching music segments is important to avoid applying perception to different raw material and getting different perceptual results for that reason alone (e.g., one segment has much different bass content than another). Minimizing switching time is important to reduce the effects of rapidly fading sensory memory. Blinding is important to reduce the effects of perceiving differences that we expect to perceive; but we can also not perceive differences because we expect that things will sound the same, and blinding doesn't solve that problem at all (i.e., if you expect things to sound the same, you may perceive it that way without really trying to find differences, and therefore just guess in the trials, so a null result is produced because it was expected).

I still think all listening tests are subject to problems related to 'trusting our ears' and 'trusting our memories', and results of one type of listening test may not necessarily generalize to other listening situations, but I'm inclined to think that controlled blind tests should be helpful in ruling out the possibility of large differences caused by expectation of large differences.
 
Jan 12, 2019 at 3:43 PM Post #12,050 of 17,336
A proposal for a rather coprehensive test on the audibility of lossy file formats.

First off, I want to applaud BigShot for taking the time to create a test for the audibility of various lossy formats. As anyone who's read my posts knows, the only faults I see with tests like this stem from the fact that they end up using a limited number of samples - and, more specifically, music that is not familiar to the test subject. (Many of us are quite certain that we are much more sensitive to differences that occur to music with which we are very familiar. And, also, because of the complexity of lossy encoding, it seem quite likely that there may be certain types of audible errors that occur only on certain tracks, or when using certain encoders and settings.)

The way to comprehensively avoid both of these issues would be to permit each test subject to submit their own music to use for the test. But, obviously, this would be absurdly labor intensive for an individual running the test. HOWEVER, it is something that could easily be automated. This suggests the possibility that some "audio club" or "research group" could create software to be used to enable this to be done over the Internet. (It would seem that there might even be "commercial motivation" for a company who benefits from the sale or licensing of compressed content - who might see a benefit to "proving to their customers that their lossy compression algorithm is really transparent" and "allowing their customers to try it for themselves".)

Here's the way it would work (reasons provided after)....

The user would submit or upload a test track (in any standard format - at CD quality or below).
The test software would produce a pair of output samples....
One sample would be the original - converted to 24/96k PCM.
The other sample would be first converted to the lossy format being tested....
THEN decoded using the appropriate decoder, and then converted to 24/96k PCM.
From those two files, a set of 10 randomly named files would be created (just by renaming them).
At that point, the software would also create a "key file", which would indentify which files were which.
We would then have ten test files, half of which had been lossy encoded, and half of which had not.
Each set of test samples and key file would be assigned a reference number so we know which goes with which.
(You'll notice this is styled on how the results of "anonymous medical tests" are handled.)

Depending on how the test was being tabulated...
The user could be permitted to download both the test files and the key file...
Or they could be asked to submit their results before being allowed to download the key file...

Note that there are specific reasons why I chose to convert all of the sample files into 24/96k PCM.

1) By converting all the files to the same lossless format, they will all end up the same size, so any size cues about which is which will have been eliminated.
2) All sample converters introduce some alteration to the content that passes through them. WIthout starting a discussion about whether that difference should or should not be audible, by passing all our samples through the SAME sample-rate conversion program, we have increased the likelihood that any differences introduced by the sample rate conversion process will be obscured.
3) By specifying that all incoming samles will be converted to a HIGHER sample rate, we have minimized the likelihood that differences created by the sample rate conversion process will be audible. (Whether they might be audible or not, converting to a higher sample rate is likely to introduce fewer and smaller mathematical differences than converting to the same or a lower sample rate.)

Obviously the user could be presented with ten samples all concatenated into the same file.
However, I've taken a few such tests, and found the need to keep track of when each sample section ends
to be annoying enough, and distracting enough, to be worth avoiding.

Note that, if we alllowed users to download samples uploaded by other users, there would be copyright issues.
HOWEVER, as long as each user is only allowed to download the samples they uploaded...
And the samples are NOT retained by the server...
This should be avoided.
 
Jan 12, 2019 at 6:57 PM Post #12,051 of 17,336
You can say what you like, but I am very experienced in this hobby, I suspect I take it more seriously than you do, I know what i'm talking about.

You should know that i'm very intelligent and intellectual person, and am not fooled by you.

I can assure you that I know lot's about audio codec's. I have a perfect understanding of it and it's not even my job.

you should know that I know allot about electronics and how circuit boards work

I am no less accomplished. I am a musician.

Yes, something I understand more clearly than the people in this forum realise.

delusions.jpeg

how about less boasting and more supporting evidence from now on?
 
Jan 12, 2019 at 7:06 PM Post #12,052 of 17,336
You bring up an excellent point - and one which many people seem to ignore.

Various sorts of blind tests can largely eliminate the effects of an expectation bias to hear a difference if there isn't any.
However, it's impossible to completely eliminate an expectation bias to NOT hear a difference.
It's quite possible that people become less likely to hear or report subtle differences if they don't expect a difference to be present.
There is also a widely known tendency for humans to respond to peer pressure when publicly reporting their experiences.

And there are even more interesting and subtle possibilities for error.
For example, we humans have a negative reaction to "failed expectations".
We tend to get frustrated when our expectations aren't met.
So, for example, someone who is expecting to hear "a big obvious difference", and fails to hear an obvious difference, may be less likely to notice a subtle difference.
(Because, after being frustrated at not hearing the obvious difference they expected to hear, they are less carefully focussed on noticing subtle differences.)

There are ways in which some of this COULD be tested statistically... if anyone was willing to bother.
Here's one suggestion for how to do so.
(In order to produce valid test results you would want a large number of test subjects to take the test.)
Test files could be made up with known flaws - perhaps different amounts of deliberately added noise or distortion.
The basic test procedure would be to run a bunch of trials to determine at what level each test subject could reliably detect and report the presence of the distortion.
HOWEVER, the test would be run multiple times, with different groups of test subjects, with each group subjected to a DIFFERENT EXPECTATION BIAS.
(Using some sort of pretext, perhaps by being told that something else was being tested, one group would EXPECT the files to be different,
one group would EXPECT them NOT to be different, and a third group would have no particular expectation either way - they would be told that some files might be different.)
It would be VERY interesting to see how the "ability to notice and report a difference" would differ between the neutral group and the two groups with "pre-loaded biases".)

There is also another sort of bias which needs to be accounted for - and which is often used to major advantage in group situations: peer pressure.
Put someone in a room full of people, and ask people to "raise their hands if they hear a difference".
As soon as a few people raise their hands, it creates a desire to "raise your hands and become part of the group".
This both biases people to raise their hands, even if they don't hear a difference, and actually creates a bias to WANT and EXPECT to hear a difference.
And, the exact converse of that, place someone in a room full of skeptics, most of whom don't raise their hands, and there is a bias NOT to "raise their hand and go against the group".
(Anyone who runs demonstrations knows how effective it is to place a few shills in the room to raise their hands at the appropriate time and "get the ball rolling".)

This effect is widely known... and described in many textbooks on the subject.... for example, Cialdini's text book on "Influence", which is course material in Harvard business school.

Both of these effects are well know... and both need to be accounted for.
The "group effect" can be accounted for by doing the tests in isolation.....
Where each person takes the test separately, and reports their results separately, and is NOT allowed to see other results until after the total is tallied.

Note how this is the exact OPPOSITE of running an online study where everyone gets to see a running total of the results their peers have already turned in.
When you do that you are introducing TWO distinct problems:
- you are introducing an EXPECTATION in each new subject to experience what the majority of previous subjects have already reported
- you are creating peer pressure to WANT to both experience and report results similar to what most others have already reported

I might also suggest an interesting way to test for that last sort of bias.... which is simply to create a phony bias and see how it affects the results.
The way to do that is relatively simple....
Create some sort of fair test and present it to three groups of test subjects; you could use BigShot's test of "which lossy compressed files are audible".
(The only requirement is that the range of differences is wide enough that it is unlikely to be "obvious to everyone".)
One group is told that "fifty people have already taken the test, and 92% of them heard an obvious difference"...
(You have now created both an expectation bias and a peer pressure bias in that group to expect and want to hear a difference.)
The other group is told "fifty people have already taken the test, and the resuts were statistically random"...
(You have now created both an expectation bias and a peer pressure bias in that group to expect and want to NOT hear a difference.)
The third group is told that they are the first ones to take the test - and they won't get to see the results tallied until their results are all turned in.
(This group is truly neutral in terms of bias.... except, of course, for any biases they may already have.)

If the results are significantly different - then you may infer that the differences were due to the initial bias.

Another point I don't really see discussed much is the difficulty of the task involved in these listening tests. When I do the tests, I'm usually not sure of whether I hear a difference. It's not the case that I'm sure 'they sound the same' or 'I definitely hear X difference', but rather more like 'I'm not really noticing a clear difference' or 'I think I might have noticed X difference, but I'm not sure'. This can sort of be remedied by doing a forced-choice ABX test where the listener has to guess if not sure. If the listener scores at a statistically above-chance level, that would suggest that they likely do notice a difference, but as far as I know, the stats won't answer these questions:

- What differences do they notice?
- How consistently do they notice those differences? (small differences likely won't be noticed anywhere near 100% of trials, and may not be much above 50%)
- How big are the differences? (effect size)

These are all important questions from a practical standpoint.
 
Last edited:
Jan 12, 2019 at 7:19 PM Post #12,053 of 17,336
OK. My thoughts are that the distinction between "hearing" and "listening" is incredibly important in discussions such as this. Hearing is effectively a physiological attribute and there can be no doubt, according to all the reliable evidence, that it significantly deteriorates with age. Every year there are countless tens of thousands of people who undertake an audiogram and there has been for several decades, so there's a wealth of evidence for deterioration and as far as I know, not a single case of someone's hearing improving (or not deteriorating) with age, unless of course it's due to recovery from some infection/condition which reduced hearing. Additionally, we have a wealth of evidence of actual physical deterioration of the ear structures themselves with age, so taken together, there can be little/no doubt. Listening though is entirely different, it's a skill, it can be taught, learned and developed, and indeed has been for many centuries. Listening skills do not affect one's hearing ability, they only affect our perception of our hearing, our ability to separate out and consciously identify details which we were always hearing but were unaware we were hearing. Therefore, assuming someone is actively training their listening skills, we have two conflicting processes at work: Hearing which is deteriorating with age and listening which is improving with age. However, we have to be careful not to conflate the two! Even if listening skill improved linearly over time with respect to hearing deterioration (which typically they don't), still the two would not just cancel each other out. It's entirely possible, even likely, that it might appear as if they do but in reality your listening skills are improving your perception/detection of details/differences BUT within a smaller audio band (both smaller freq band and smaller dynamic range band) due to hearing deterioration. As you loose your HF response, it's gone, and you cannot train your listening skills to discern something that's effectively no longer there.

I'm not aware of any long term scientific studies which support the above assertion, I'm just recounting the well known (and accepted) knowledge/experience of the audio engineering community, who, in lieu of published science, are best placed to judge, as we work with known freq range audio virtually every working day of our careers. I would therefore consider the above as somewhat reliable evidence, the best evidence we currently have and further supported by the fact that it's in agreement with what IS scientifically known/accepted but not necessarily definitive.

Thanks for the detailed reply, that fits my thinking.

I wonder how much LF and HF hearing loss affects ability to perceive detail in music. My guess would be not much (until the hearing loss is pretty bad), with really high frequencies helping more with giving a sense of 'sparkle' rather than being essential to perceiving detail in recorded music.
 
Last edited:
Jan 12, 2019 at 7:25 PM Post #12,054 of 17,336
how about less boasting and more supporting evidence from now on?

I don't like to boast, sorry if it sounds like I am. I'm generally an honest human being.
If I am wrong at something I admit it.
 
Jan 12, 2019 at 8:39 PM Post #12,055 of 17,336
The question is what the net effect is of degradation of the ears versus compensation and learning by the brain. It's plausible that most 50-yos would be surprised at how much more detailed the perception of their former 18-yo selves is, but you haven't provided any real evidence of that.

On the other hand, take a look at slides 44, 56, and 123 here:

https://www.listeninc.com/wp/media/Perception_and_-Measurement_of_Headphones_Sean_Olive.pdf

Slide 44 shows that headphone preference doesn't vary much with age. Slide 56 similarly shows only a small variation with age in preference regarding bass and treble amounts, and the difference increases somewhat above age 56. Slide 123 states that "Listeners prefer models that are accurate and neutral across age, listening experience, or culture with some slight bass/treble variations to account for program/gender/training/hearing loss." Adding to this that I had no difficulty with multiple hearing tests, and don't have the impression that I perceive less detail in music than I did when I was 18, I'm not convinced that compensation by the brain isn't sufficient to make up for hearing loss for many people up to about age 50.

Regarding F1 drivers, there are surely many 50-yo former F1 drivers who are still better drivers than non-F1 pro drivers who are much younger. The main issue with F1 drivers aging isn't worsening perception, but rather diminishing stamina to handle dozens of consecutive laps where they repeatedly hit 4-5G, sometimes in hot weather. Back when the average fitness level of F1 drivers wasn't as high and G-forces were lower, there were winning F1 drivers in their 40s and over 50:

https://en.wikipedia.org/wiki/List_of_Formula_One_driver_records#Oldest_winners
the Harman guys have been the true kings of testing audiophile claims and myths for decades.
749956.png
 
Jan 12, 2019 at 9:45 PM Post #12,057 of 17,336
His results were random because he didn't understand how the samples were arranged in the track. I've explained the problem to him in PM and offered to give him the test again with a different file. Admittedly it is a difficult test. I'm thinking I might need to add 96 and 128 to give people something to grab onto at the beginning before their picks start becoming random. The idea is to start out with accurate rankings at the low end and then work progressively up until your picks aren't as accurate any more. That's the point of transparency.

If anyone else would like to find their threshold, let me know.
 
Last edited:
Jan 12, 2019 at 9:53 PM Post #12,059 of 17,336
I have no problem with your forthcoming comments. I'll just get back to listening to my Linn + Marantz CD player combo with my legendary Denon AH-D5000 headphones. You ought to hear the level of detail extraction from this system of mine's. It would surprise many of you.
 
Last edited:
Jan 12, 2019 at 9:57 PM Post #12,060 of 17,336
The results were random because he didn't understand how the samples were arranged in the track. I've explained the problem to him in PM and offered to give him the test again with a different file. Admittedly it is a difficult test. I'm thinking I might need to add 96 and 128 to give people something to grab onto at the beginning before their picks start becoming random. The idea is to start out with accurate rankings and then work progressively up until your picks aren't as accurate any more. That's the point of transparency.

If anyone else would like to find their threshold, let me know.


I like the idea of adding lower bit rates to the test. Would it make more sense to add them in randomly? It would be interesting to see if the success rate is statistically significantly different.
 

Users who are viewing this thread

Back
Top