Testing audiophile claims and myths | Page 805 | Headphone Reviews and Discussion - Head-Fi.org

SonyFan121 · Jan 12, 2019

I have no problem with your forthcoming comments. I'll just get back to listening to my Linn + Marantz CD player combo with my legendary Denon AH-D5000 headphones. You ought to hear the level of detail extraction from this system of mine's. It would surprise many of you.

bfreedma · Jan 12, 2019 at 9:57 PM

bigshot said:
The results were random because he didn't understand how the samples were arranged in the track. I've explained the problem to him in PM and offered to give him the test again with a different file. Admittedly it is a difficult test. I'm thinking I might need to add 96 and 128 to give people something to grab onto at the beginning before their picks start becoming random. The idea is to start out with accurate rankings and then work progressively up until your picks aren't as accurate any more. That's the point of transparency.

If anyone else would like to find their threshold, let me know.

I like the idea of adding lower bit rates to the test. Would it make more sense to add them in randomly? It would be interesting to see if the success rate is statistically significantly different.

bigshot · Jan 12, 2019 at 10:05 PM

Yes, it would have to be 12 samples in random order. I prepared 10 variations of 10 and that took me a while to do. I'm not eager to set it all up again, but if I see a lot of people saying that they can't discern even the lowest level, then I'll carve out time to do that. Most people who have taken the test can actually rank AAC>LAME>Fraunhofer at 192. It usually gets random after that..Since he didn't understand that each sample consisted of two bits of music, his results were scrambled. Simple mistake to fix with another run at the test.

SonyFan121 · Jan 12, 2019 at 10:10 PM

SonyFan121 said:
I have no problem with your forthcoming comments.

bfreedma · Jan 12, 2019 at 10:13 PM

bigshot said:
Yes, it would have to be 12 samples in random order. I prepared 10 variations of 10 and that took me a while to do. I'm not eager to set it all up again, but if I see a lot of people saying that they can't discern even the lowest level, then I'll carve out time to do that. Most people who have taken the test can actually rank AAC>LAME>Fraunhofer at 192. It usually gets random after that..Since he didn't understand that each sample consisted of two bits of music, his results were scrambled. Simple mistake to fix with another run at the test.

Makes sense. Appreciate the time you’ve invested in putting the test together.

KeithEmo · Jan 12, 2019 at 11:12 PM

That's another good point.

It does, however, depend on what your actual goals are when performing the test. If you're looking for a real scientific answer, taking into account that some people may in fact consciously be unsure that they hear a difference, then a forced A/B/X protocol will probably give you more accurate results. However, if you're doing market research for a new product, then what you probably really want to know is if a significant number of test subjects clearly and consciously notice a difference. If that't your goal, then there's no reason to spend the extra effort to resolve small uncertainties.

I should also point out yet another factor that sometimes confounds these sorts of tests. It's called "self selection bias". What that basically means is that, if you're asking for volunteers, your test population is limited to people who want to take your test. In simplest terms, most of the people who are already certain that a difference is - or is not - audible aren't going to bother to take your test... Some may do so because they honestly "may want to contribute to science", and some others may believe they do or don't hear a difference, and be looking for confirmation of that belief, but most simply won't be interested enough to take the test. As a result, your test sample does NOT represent a true cross section of the general population; instead, your test sample has been self-selected for "those who are interested but unsure - and consider the question important enough to show up".

One solution, which is often employed in serious testing, is to use truly random samples. You get a letter in the mail stating "your name has been chosen at random to take this test", or someone at the mall invites you to come into a back room and sample three new soft drinks. Another is what we might term "motivated selection". We have a pretty good idea who can run the mile the fastest - because there is a major incentive for fast runners to try out for sports events. We probably have no idea how fast the fastest human can run up five flights of stairs - because nobody has any motivation to find out. (And, if you were to try to find out, unless you offered a cash prize, nobody would show up to compete.)

One solution there IS to offer a prize of some sort.

For example, if you REALLY want to find out if ANYBODY can reliably tell the difference wth those compressed files....... Offer a public contest, where people can submit their own samples for you to encode, AND OFFER A PRIZE FOR ANYONE WHO CAN SUCCESSFULLY PROVE THAT THEY CAN RELIABLY TELL THE DIFFERENCE. The prize offers incentive for people who are already convinced that they hear an obvious difference to participate. And, if nobody can hear a difference, then you won't end up having to pay out the prize anyway. I could imagine a booth at an audio show, promoting some new sort of compression. Visitors would be encouraged to bring in their own song, whcih would be compressed on the spot, and inserted into a fancy "A/B test machine" where they could listen to the samples in random order. They would be offered a choice of using several popular premium headphones - of bringing their own. They would be offered a $500 prize if they could tell which samples were compressed and which ones weren't at least 18 times out of 20. I suspect you'd get plenty of participants, and a negative result under those circumstances would be quite compelling.

(This might also be an interesting event to offer to raise interest for a local audio club.)

Phronesis said:
Another point I don't really see discussed much is the difficulty of the task involved in these listening tests. When I do the tests, I'm usually not sure of whether I hear a difference. It's not the case that I'm sure 'they sound the same' or 'I definitely hear X difference', but rather more like 'I'm not really noticing a clear difference' or 'I think I might have noticed X difference, but I'm not sure'. This can sort of be remedied by doing a forced-choice ABX test where the listener has to guess if not sure. If the listener scores at a statistically above-chance level, that would suggest that they likely do notice a difference, but as far as I know, the stats won't answer these questions:

- What differences do they notice?
- How consistently do they notice those differences? (small differences likely won't be noticed anywhere near 100% of trials, and may not be much above 50%)
- How big are the differences? (effect size)

These are all important questions from a practical standpoint.

SonyFan121 · Jan 12, 2019 at 11:31 PM

If I where to be brutally honest, Bigshots test is rather flawed. The file that contained the audio samples was itself an audio file.

bfreedma · Jan 12, 2019 at 11:39 PM

SonyFan121 said:
If I where to be brutally honest, Bigshots test is rather flawed. The file that contained the audio samples was itself an audio file.

What, specifically, do you find wrong with the file or the test methodology?

KeithEmo · Jan 12, 2019 at 11:42 PM

It's pretty obvious that the goal of any company is to maximize their profits.
You can only hope that they're trying to do so by offering a better product, at a lower price, than their competitors.

However, watch a few car commericals, and note how many of the features they highlight are "necessary".
(Watch both what they say and the messaging that's not at all hidden in the scene itself.)
For example, notice how, when they show that new SUV, while they're telling you about the great mileage....
They show a mom with three totally obnoxious little monsters...
Who mysteriously become well-behaved little angels the moment they're loaded into that new SUV they're selling.
(Isn't it pretty obvious that what they're really selling is the idea that, if you buy their SUV, your kids will behave?)
(And, is that really any different than when an audio company suggests that you'll hear some tiny difference.?)

They're simply trying to get you to part with as much money as you're willing to spend...
In return for a product that will make you happy...

sonitus mirus said:
They are reasonable and rational, but they are still attempting to sell us something more expensive than is necessary to achieve a specific result in an effort to make a profit.

SonyFan121 · Jan 12, 2019

I don't believe he had been 100% honest with me about his methodology. Furthermore,- the audio samples where not separate files, they where contained within a Flac audio file that lasted 23 minutes. What's the point in doing a blind test when the audio samples are not separate and original.
It's as though he recorded each sample using Audacity or something, then exported it as a Flac file. I suspect that is not how it's done in the real world, by the professionals.

KeithEmo · Jan 12, 2019 at 11:48 PM

I would suggest TWO additions.

The first (known) sample should be the original uncompressed file.
The second (known) sample should be a very low bit-rate lossy file which sounds obviously different.
This provides an audible reference of "what both ends of the continuum sound like"...
It also provides a way for the test subject to "learn what the flaws introduced by lossy compression sound like".
These should then be followed by the "unknown samples".

There should also be a few low-bit-rate files included with the other samples (just for completeness).

bfreedma said:
I like the idea of adding lower bit rates to the test. Would it make more sense to add them in randomly? It would be interesting to see if the success rate is statistically significantly different.

bfreedma · Jan 12, 2019 at 11:58 PM

SonyFan121 said:
I don't believe he had been 100% honest with me about his methodology. Furthermore,- the audio samples where not separate files, they where contained within a Flac audio file that lasted 23 minutes. What's the point in doing a blind test when the audio samples are not separate and original.

The objective of the test is to establish an individual’s ability to sequentially rate the samples comprised of combinations of different bit rates and encoders. Why would the samples need to be in separate files? There’s nothing that would prevent you from moving back and forth within the FLAC to compare the various samples.

You’ve stated you have deep expertise in this domain. Please explain why you believe the contents of the FLAC are not identical to the “original” samples.

One of the reasons the test is constructed this way is to minimize the possibility of cheating by opening individual samples in an audio analysis tool.

KeithEmo · Jan 12, 2019 at 11:59 PM

I'm not suggesting specifically how significant this is...
But, every time you convert a file from one format to another, the conversion software alters it.
Therefore, it's POSSIBLE that this alteration is obscuring some difference which would otherwise be audible.

It's a sort of Catch-22 ......
If the files are presented in their "original form", after being converted, then that will offer a cue about which is which.
However, if you convert them all to a common format, it's POSSIBLE that the conversion will obscure a difference.

The only way to avoid the possibility that this could happen is to use hardware to play the original files.
However, by converting all samples to a higher sample rate, the likelihood that this might happen can be minimized.

bfreedma said:
What, specifically, do you find wrong with the file or the test methodology?

bfreedma · Jan 13, 2019 at 12:01 AM

KeithEmo said:
I would suggest TWO additions.

The first (known) sample should be the original uncompressed file.
The second (known) sample should be a very low bit-rate lossy file which sounds obviously different.
This provides an audible reference of "what both ends of the continuum sound like"...
It also provides a way for the test subject to "learn what the flaws introduced by lossy compression sound like".
These should then be followed by the "unknown samples".

There should also be a few low-bit-rate files included with the other samples (just for completeness).

That would work as well. To simplify Bigshot’s work(if he’s willing to invest more time), the uncompressed/highly compressed known example could be a separate file.

KeithEmo · Jan 13, 2019

"KeithEmo, post: 14715145, member: 403988"]

I'm not suggesting specifically how significant this is...
But, every time you convert a file from one format to another, the conversion software alters it.
Therefore, it's POSSIBLE that this alteration is obscuring some difference which would otherwise be audible.

The details depend on the details of how you perform the conversions.
If you convert a file to 256 AAC, then convert the 256 AAC file to FLAC, you have performed TWO conversions.
If you convert a 256 AAC file to a 16/44k WAV, then convert that to a 16/44k FLAC, the FLAC should be the same.
However, if you convert a 256 AAC to a 16/44k WAV, then convert that to a 24/96k FLAC...
Then you have added a sample rate conversion.... which cannot be absolutely assumed to be "prefectly transparent".
Likewise, you cannot assume that the decoder will produce identical results if directly outputting 44k and 96k rates.
(Most of these could be ruled out.... but should not be assumed to be negligible.)

It's a sort of Catch-22 ......
If the files are presented in their "original form", after being converted, then that will offer a cue about which is which.
However, if you convert them all to a common format, it's POSSIBLE that the conversion will obscure a difference.

The best way to avoid the possibility that this could happen is to use hardware to play the original files.
However, by converting all samples to a higher sample rate, the likelihood that this might happen can be minimized.

Latest Thread Images

SonyFan121

AKA Steven31, Audio Aficionado, Audioholic123, JVC steven, and others

bfreedma

The Hornet!

bigshot

Headphoneus Supremus

SonyFan121

AKA Steven31, Audio Aficionado, Audioholic123, JVC steven, and others

bfreedma

The Hornet!

KeithEmo

Member of the Trade: Emotiva

SonyFan121

AKA Steven31, Audio Aficionado, Audioholic123, JVC steven, and others

bfreedma

The Hornet!

KeithEmo

Member of the Trade: Emotiva

SonyFan121

AKA Steven31, Audio Aficionado, Audioholic123, JVC steven, and others

KeithEmo

Member of the Trade: Emotiva

bfreedma

The Hornet!

KeithEmo

Member of the Trade: Emotiva

bfreedma

The Hornet!

KeithEmo

Member of the Trade: Emotiva

Users who are viewing this thread