Testing audiophile claims and myths
Jan 12, 2019 at 10:05 PM Post #12,061 of 17,336
Yes, it would have to be 12 samples in random order. I prepared 10 variations of 10 and that took me a while to do. I'm not eager to set it all up again, but if I see a lot of people saying that they can't discern even the lowest level, then I'll carve out time to do that. Most people who have taken the test can actually rank AAC>LAME>Fraunhofer at 192. It usually gets random after that..Since he didn't understand that each sample consisted of two bits of music, his results were scrambled. Simple mistake to fix with another run at the test.
 
Jan 12, 2019 at 10:13 PM Post #12,063 of 17,336
Yes, it would have to be 12 samples in random order. I prepared 10 variations of 10 and that took me a while to do. I'm not eager to set it all up again, but if I see a lot of people saying that they can't discern even the lowest level, then I'll carve out time to do that. Most people who have taken the test can actually rank AAC>LAME>Fraunhofer at 192. It usually gets random after that..Since he didn't understand that each sample consisted of two bits of music, his results were scrambled. Simple mistake to fix with another run at the test.

Makes sense. Appreciate the time you’ve invested in putting the test together.
 
Jan 12, 2019 at 11:12 PM Post #12,064 of 17,336
That's another good point.

It does, however, depend on what your actual goals are when performing the test. If you're looking for a real scientific answer, taking into account that some people may in fact consciously be unsure that they hear a difference, then a forced A/B/X protocol will probably give you more accurate results. However, if you're doing market research for a new product, then what you probably really want to know is if a significant number of test subjects clearly and consciously notice a difference. If that't your goal, then there's no reason to spend the extra effort to resolve small uncertainties.

I should also point out yet another factor that sometimes confounds these sorts of tests. It's called "self selection bias". What that basically means is that, if you're asking for volunteers, your test population is limited to people who want to take your test. In simplest terms, most of the people who are already certain that a difference is - or is not - audible aren't going to bother to take your test... Some may do so because they honestly "may want to contribute to science", and some others may believe they do or don't hear a difference, and be looking for confirmation of that belief, but most simply won't be interested enough to take the test. As a result, your test sample does NOT represent a true cross section of the general population; instead, your test sample has been self-selected for "those who are interested but unsure - and consider the question important enough to show up".

One solution, which is often employed in serious testing, is to use truly random samples. You get a letter in the mail stating "your name has been chosen at random to take this test", or someone at the mall invites you to come into a back room and sample three new soft drinks. Another is what we might term "motivated selection". We have a pretty good idea who can run the mile the fastest - because there is a major incentive for fast runners to try out for sports events. We probably have no idea how fast the fastest human can run up five flights of stairs - because nobody has any motivation to find out. (And, if you were to try to find out, unless you offered a cash prize, nobody would show up to compete.)

One solution there IS to offer a prize of some sort.

For example, if you REALLY want to find out if ANYBODY can reliably tell the difference wth those compressed files....... Offer a public contest, where people can submit their own samples for you to encode, AND OFFER A PRIZE FOR ANYONE WHO CAN SUCCESSFULLY PROVE THAT THEY CAN RELIABLY TELL THE DIFFERENCE. The prize offers incentive for people who are already convinced that they hear an obvious difference to participate. And, if nobody can hear a difference, then you won't end up having to pay out the prize anyway. I could imagine a booth at an audio show, promoting some new sort of compression. Visitors would be encouraged to bring in their own song, whcih would be compressed on the spot, and inserted into a fancy "A/B test machine" where they could listen to the samples in random order. They would be offered a choice of using several popular premium headphones - of bringing their own. They would be offered a $500 prize if they could tell which samples were compressed and which ones weren't at least 18 times out of 20. I suspect you'd get plenty of participants, and a negative result under those circumstances would be quite compelling.

(This might also be an interesting event to offer to raise interest for a local audio club.)

Another point I don't really see discussed much is the difficulty of the task involved in these listening tests. When I do the tests, I'm usually not sure of whether I hear a difference. It's not the case that I'm sure 'they sound the same' or 'I definitely hear X difference', but rather more like 'I'm not really noticing a clear difference' or 'I think I might have noticed X difference, but I'm not sure'. This can sort of be remedied by doing a forced-choice ABX test where the listener has to guess if not sure. If the listener scores at a statistically above-chance level, that would suggest that they likely do notice a difference, but as far as I know, the stats won't answer these questions:

- What differences do they notice?
- How consistently do they notice those differences? (small differences likely won't be noticed anywhere near 100% of trials, and may not be much above 50%)
- How big are the differences? (effect size)

These are all important questions from a practical standpoint.
 
Jan 12, 2019 at 11:31 PM Post #12,065 of 17,336
If I where to be brutally honest, Bigshots test is rather flawed. The file that contained the audio samples was itself an audio file.
 
Jan 12, 2019 at 11:42 PM Post #12,067 of 17,336
It's pretty obvious that the goal of any company is to maximize their profits.
You can only hope that they're trying to do so by offering a better product, at a lower price, than their competitors.

However, watch a few car commericals, and note how many of the features they highlight are "necessary".
(Watch both what they say and the messaging that's not at all hidden in the scene itself.)
For example, notice how, when they show that new SUV, while they're telling you about the great mileage....
They show a mom with three totally obnoxious little monsters...
Who mysteriously become well-behaved little angels the moment they're loaded into that new SUV they're selling.
(Isn't it pretty obvious that what they're really selling is the idea that, if you buy their SUV, your kids will behave?)
(And, is that really any different than when an audio company suggests that you'll hear some tiny difference.?)

They're simply trying to get you to part with as much money as you're willing to spend...
In return for a product that will make you happy...

They are reasonable and rational, but they are still attempting to sell us something more expensive than is necessary to achieve a specific result in an effort to make a profit.
 
Jan 12, 2019 at 11:47 PM Post #12,068 of 17,336
I don't believe he had been 100% honest with me about his methodology. Furthermore,- the audio samples where not separate files, they where contained within a Flac audio file that lasted 23 minutes. What's the point in doing a blind test when the audio samples are not separate and original.
It's as though he recorded each sample using Audacity or something, then exported it as a Flac file. I suspect that is not how it's done in the real world, by the professionals.
 
Last edited:
Jan 12, 2019 at 11:48 PM Post #12,069 of 17,336
I would suggest TWO additions.

The first (known) sample should be the original uncompressed file.
The second (known) sample should be a very low bit-rate lossy file which sounds obviously different.
This provides an audible reference of "what both ends of the continuum sound like"...
It also provides a way for the test subject to "learn what the flaws introduced by lossy compression sound like".
These should then be followed by the "unknown samples".

There should also be a few low-bit-rate files included with the other samples (just for completeness).

I like the idea of adding lower bit rates to the test. Would it make more sense to add them in randomly? It would be interesting to see if the success rate is statistically significantly different.
 
Jan 12, 2019 at 11:58 PM Post #12,070 of 17,336
I don't believe he had been 100% honest with me about his methodology. Furthermore,- the audio samples where not separate files, they where contained within a Flac audio file that lasted 23 minutes. What's the point in doing a blind test when the audio samples are not separate and original.

The objective of the test is to establish an individual’s ability to sequentially rate the samples comprised of combinations of different bit rates and encoders. Why would the samples need to be in separate files? There’s nothing that would prevent you from moving back and forth within the FLAC to compare the various samples.

You’ve stated you have deep expertise in this domain. Please explain why you believe the contents of the FLAC are not identical to the “original” samples.

One of the reasons the test is constructed this way is to minimize the possibility of cheating by opening individual samples in an audio analysis tool.
 
Jan 12, 2019 at 11:59 PM Post #12,071 of 17,336
I'm not suggesting specifically how significant this is...
But, every time you convert a file from one format to another, the conversion software alters it.
Therefore, it's POSSIBLE that this alteration is obscuring some difference which would otherwise be audible.

It's a sort of Catch-22 ......
If the files are presented in their "original form", after being converted, then that will offer a cue about which is which.
However, if you convert them all to a common format, it's POSSIBLE that the conversion will obscure a difference.

The only way to avoid the possibility that this could happen is to use hardware to play the original files.
However, by converting all samples to a higher sample rate, the likelihood that this might happen can be minimized.

What, specifically, do you find wrong with the file or the test methodology?
 
Jan 13, 2019 at 12:01 AM Post #12,072 of 17,336
I would suggest TWO additions.

The first (known) sample should be the original uncompressed file.
The second (known) sample should be a very low bit-rate lossy file which sounds obviously different.
This provides an audible reference of "what both ends of the continuum sound like"...
It also provides a way for the test subject to "learn what the flaws introduced by lossy compression sound like".
These should then be followed by the "unknown samples".

There should also be a few low-bit-rate files included with the other samples (just for completeness).

That would work as well. To simplify Bigshot’s work(if he’s willing to invest more time), the uncompressed/highly compressed known example could be a separate file.
 
Jan 13, 2019 at 12:07 AM Post #12,073 of 17,336
"KeithEmo, post: 14715145, member: 403988"]

I'm not suggesting specifically how significant this is...
But, every time you convert a file from one format to another, the conversion software alters it.
Therefore, it's POSSIBLE that this alteration is obscuring some difference which would otherwise be audible.

The details depend on the details of how you perform the conversions.
If you convert a file to 256 AAC, then convert the 256 AAC file to FLAC, you have performed TWO conversions.
If you convert a 256 AAC file to a 16/44k WAV, then convert that to a 16/44k FLAC, the FLAC should be the same.
However, if you convert a 256 AAC to a 16/44k WAV, then convert that to a 24/96k FLAC...
Then you have added a sample rate conversion.... which cannot be absolutely assumed to be "prefectly transparent".
Likewise, you cannot assume that the decoder will produce identical results if directly outputting 44k and 96k rates.
(Most of these could be ruled out.... but should not be assumed to be negligible.)

It's a sort of Catch-22 ......
If the files are presented in their "original form", after being converted, then that will offer a cue about which is which.
However, if you convert them all to a common format, it's POSSIBLE that the conversion will obscure a difference.

The best way to avoid the possibility that this could happen is to use hardware to play the original files.
However, by converting all samples to a higher sample rate, the likelihood that this might happen can be minimized.
 
Last edited:
Jan 13, 2019 at 12:13 AM Post #12,074 of 17,336
I'm not suggesting specifically how significant this is...
But, every time you convert a file from one format to another, the conversion software alters it.
Therefore, it's POSSIBLE that this alteration is obscuring some difference which would otherwise be audible.

It's a sort of Catch-22 ......
If the files are presented in their "original form", after being converted, then that will offer a cue about which is which.
However, if you convert them all to a common format, it's POSSIBLE that the conversion will obscure a difference.

The only way to avoid the possibility that this could happen is to use hardware to play the original files.
However, by converting all samples to a higher sample rate, the likelihood that this might happen can be minimized.


I believe that audible issues which consistently and repeatedly obscure differences due to converting the originals to FLAC is beyond unlikely. Unless you can point me to examples of this occurring, it seems like a red herring.

If anything, conversion issues (if they existed) would be far more likely to increase the ability to differentiate the samples.

Let’s not muddy the waters unless there is substantiation of the reason and not pure speculation.
 
Jan 13, 2019 at 12:42 AM Post #12,075 of 17,336
The objective of the test is to establish an individual’s ability to sequentially rate the samples comprised of combinations of different bit rates and encoders. Why would the samples need to be in separate files? There’s nothing that would prevent you from moving back and forth within the FLAC to compare the various samples.

You’ve stated you have deep expertise in this domain. Please explain why you believe the contents of the FLAC are not identical to the “original” samples.

One of the reasons the test is constructed this way is to minimize the possibility of cheating by opening individual samples in an audio analysis tool.
Yes it occurred to me that he might think I would cheat, which I certainly would not do.
The samples where contained within a FLAC audio file which means they where converted from their original format. A very ameture thing to do. Real pro’s would know not to do that, if looking for genuine results.
 
Last edited:

Users who are viewing this thread

Back
Top