Uncompressed Lossless (WAV) vs Compressed (FLAC / ALAC) - O/T discussion moved from main forum thread | Page 7 | Headphone Reviews and Discussion - Head-Fi.org

VNandor · Jul 12, 2016 at 9:46 PM

nick_charles said:
And one which traditional DBT could easily verify or not - can you tell the difference between wav0 and wav5 reliably say 12/15 times - trivial and can be self-administered

I don't get what you are saying.

Uberclocked · Jul 12, 2016 at 9:56 PM

vnandor said:
I don't get what you are saying.

>.<
If DBT is what you don't get, he means double blind testing.

VNandor · Jul 12, 2016 at 10:16 PM

I don't get what he means by telling the difference wav0 and wav5.
My problem is that a null test is for determining if the signals we are testing are the same or not. If I get a null they are the same if I don't they aren't. How am I supposed to hear any differences between two wavs if no difference showed up in a null test? Because according to the test it's possible: "this particular file property cannot be detected as a change in file size or by conventional null testing procedures, yet it can be readily detected by ear". Or did I misunderstand something?

bilboda · Jul 12, 2016 at 11:44 PM

nick_charles said:
And one which traditional DBT could easily verify or not - can you tell the difference between wav0 and wav5 reliably say 12/15 times - trivial and can be self-administered

They are specifying that the height is affected and they actually elevate to different heights at a given position to determine the difference where it is apparently clearly audible. They state that this can be reliably repeated using the same wav/flac/wav/flac...conversions. In the interests of science, I'd try the same comparison but I am no scientist, and a bit lazy. Given such a specific claimed response location, I'd hope one of you more energetic fellows would volunteer.

castleofargh · Jul 13, 2016 at 7:07 AM

here are a non exhaustive list of questions I have after reading this article:
- were more than one sample from one specific album used at any time to confirm this was a global problem?

- did they test for DRM free stuff? I know DRMs stop at nothing to mess with our gears, I seem to remember one where they added errors on purpose with the hope that more copies would lead to more errors and make the file unreadable much sooner. I don't even know if that's actually a thing, but it has made me kind of paranoiac about DRMs so I would want to clear that out.

- why limit the encoding test to 2 versions of DBPA and audacity when it's obvious that there is a 50/50 chance for differences to comes out of either the encoding or the playback. I would have tried anything I could to check if one might be free of the phenomenon.

- same obvious question for the playback software. finding one that might not show a difference would have been a major finding, so why is it not mentioned that it was researched(if it was at all) jriver and vlc don't exactly account for testing all playback software variations.

- did they use any method to confirm that there was no cue from the test helping people know it was flac? this one particularly interests me because with DAPs just the starting speed when pressing play is enough to tell sometimes. and stuff like covers can have a significant impact on those delays. so funnily enough when I test for formats I always remove all metadata including covers(some DAPs over the years had bugs with specific tags so I got used to simply remove everything. I only kept artist and album tags. so ironically enough, if they are right, I have spent years testing formats after removing the one thing that made the most difference ^_^. and as I always used below 300*300 covers in my casual music life, if they're right, I really didn't help making differences obvious.
still even if I understand that we can't compare the fluidity and processing power of a DAP to a computer, I'd be interested to know if there were delay differences when pressing play, and if those could give a hint to listeners?

and last what annoys me a little is how the conclusion doesn't match the results IMO. going flac and back to wave still gave a better result than the flac itself. proof if needed that there is something going on with the playback more than with the file itself. I don't think enough emphasis is put on this. but that's just me.
anyway. it sure makes me want to test all this myself, but adding it to my todolist seems ironic at this point. I add like 5 stuff a week while removing one or 2 at best.

nick_charles · Jul 13, 2016 at 10:49 AM

vnandor said:
I don't get what he means by telling the difference wav0 and wav5.
My problem is that a null test is for determining if the signals we are testing are the same or not. If I get a null they are the same if I don't they aren't. How am I supposed to hear any differences between two wavs if no difference showed up in a null test? Because according to the test it's possible: "this particular file property cannot be detected as a change in file size or by conventional null testing procedures, yet it can be readily detected by ear". Or did I misunderstand something?

Having gone through Wav0 (original wav) convert to Flac0 then reconvert Flac0 to WAV1 then reconvert Wav1 to FLAC1........... Wav5 and FLAC5 - if WAV0 and WAV5 are different in terms of the PCM content then ye olde DBT should work fine to see if listeners can hear a difference.

But as you say if WAv0 and WAv5 null perfectly then we know there is no difference in PCM content, so where do the supposed audible differences come from ?

The easiest explanation is poor testing protocol. Single blind testing is always suspect as the experimenter can consciously or unconsciously give cues to the subject. See the story of Clever Hans a horse that seemed to be able to solve mathematics problems but just responded to cues from the handler

nick_charles · Jul 13, 2016 at 11:10 AM

bilboda said:
They are specifying that the height is affected and they actually elevate to different heights at a given position to determine the difference where it is apparently clearly audible. They state that this can be reliably repeated using the same wav/flac/wav/flac...conversions. In the interests of science, I'd try the same comparison but I am no scientist, and a bit lazy. Given such a specific claimed response location, I'd hope one of you more energetic fellows would volunteer.

From article Quote:

Height estimates were made (using a tape measure hung from the ceiling behind the speakers) by listening to a specific chord, repeated twice on a harp passage during a 6s segment (39-45s into track 1).

I read this that listeners estimated the image height and this estimation was measured not that the speakers themselves were elevated and that the image height estimated by listeners was a proxy for sound quality I see no description that suggests that listeners ever had to make a direct comparisons between A and B or show an ability to detect same/different between A and B

bilboda · Jul 13, 2016 at 11:56 AM

So they never got out of their chair. I only read the blog. The link to the test is broken on the first page. I see the one at the end works. Guess, I'll have to put on my reading glasses. and catch up.

nick_charles · Jul 13, 2016 at 12:08 PM

I did the whole WAV0 to WAV5 thing via FLAC0, FLAC1 FLAC2....

I checked the fr spectrum on WAV0 and WAV5 - identical to 6 decimal places - differences in each freq bucket 0.000000 - FLAC5 and WAV0 same result

For conversion I used an ancient FLAC frontend

wtaylorbasil · Jul 13, 2016 at 3:25 PM

What software are you using to check the fr spectrum?

nick_charles · Jul 13, 2016 at 3:41 PM

wtaylorbasil said:
What software are you using to check the fr spectrum?

I used Audacity to get the spectrum analyses - the output data was exported to text files and loaded and compared using Excel. I tried a few diff ffa - 4k and 16K and the differences were always 0

such as

reginalb · Jul 13, 2016 at 6:42 PM

So, I finally took some time to read this article. I am not an audio expert, as many here. But as I have said, I know a thing or two about research. There are just some baffling, staggering statements made in the full 2 part article when you read it.

DBT plays a vital role in medical and drug research and can also be useful in audio for detecting sonic differences that fall within the range of the measurement system’s and subject’s sensitivities. Such tests are essential for proving that two conditions are different at a statistical level of high probability when properly executed. To be scientifically proven, an individual must guess correctly no less than 23 times out of 24 trials. If you obtain such a result you can go home happy. But if you obtain anything less or a completely random result “proving” no difference, then you could still be left with the nagging question of whether your experimental design or your assay method was flawed or not sensitive enough. So finding no statistical difference, as has happened in so many pseudoscientific audio tests, is not conclusive.

I love that these pseudo-scientific dingbats prove out the gate that they're crazy biased. And I have read this exact sentiment on this forum. "ABX testing is bogus. I know this because I tested to sources with the ABX method that I know are different, and I couldn't tell them apart. Therefore ABX testing shouldn't be used." I mean, the inherent stupidity in such a statement should be apparent to anyone with functioning brain cells.

But that the above got published means it throws the entire publication that would print such idiocy in to question. I mean, it literally says "If you prove that two things are different, then your test was good, otherwise it wasn't." For me, the way that I was taught long ago to weed scientific inquiry from pseudo-science was this: did the "researcher" present a theory and a way to disprove it. That is the goal of science, to disprove your hypothesis. The greatest scientists the world over layout methods to disprove their theories. That's how it works. That is why an experiment can disprove or fail to disprove a hypothesis. It can not prove it. They have immediately (and clearly unknowingly) thrown their own credibility out the window with this paragraph.

If you do find a difference and you can run a legitimate statistical analysis on the data, then you have proven your case that there is a difference between two conditions.

Where the pseudo-scientific objectivists go wrong is when they engage in “triple-blind testing”. This we define (with tongue in cheek) as limitedsensitivity, double-blind-tests, coupled with negative expectation bias, an unholy trinity

Alright, firstly, there is definitely the possibility that we have an expectation bias here in SS for there to not be a difference. This is why you can't take a group of us, and generalize results if you conducted an experiment with us as your subjects. Nor can you generalize results of a trial with audiophiles, or engineers, or young girls, or old men, or anything else. Selection bias isn't oft talked about in regards to AES studies and the like. I mean, there tends to be so many other ways to pick apart these studies, there really isn't any reason to expound in to something like selection bias. But I have yet to see a single study referenced here that appear to have a sample size that was sufficiently large, and wherein the sample was selected randomly. So generalization isn't possible. That's why I've tested stuff on my own with an A/B switch. I can't generalize any of the results I've seen (though I know which side I err to).

Anyhow, my point is this, if they were really interested in proving the pseudo-scientific objectivists wrong, it would be simple, but they would have to have a good random sample to experiment on. And they are quite clearly uninterested in understanding and/or performing good science.

In addition, there are many articles available discussing the limitations of DBT and ABX testing for audio, including the Bell Labs scientists who originated the ABX test. A random sampling of a few references is given below*

One of the references is a freaking forum thread. A. Forum. Thread. Awesome. I didn't bother to read it because it's a wall of text not broken in to paragraphs. One is some random dude's blog who says that blind tests aren't any good because listeners would have to be trained to hear the difference between anything but transducers. We have had similar arguments so many times here where someone wants to throw ABX testing under the bus because of something that isn't inherent in ABX testing. You can train someone before their ABX. And the article was actually trashing all blind testing. Which is, of course, ludicrous.

They also provided as I guess they were mentioning when they said that they were including the researchers that originated the ABX, the 1950 article Standardizing Audio Tests. I think they're confused because here is a quote from the abstract:

The purpose of the present paper is to describe a test procedure which has shown promise in this direction and to give descriptions of equipment which have been found helpful in minimizing the variability of the test results. The procedure, which we have called the “ABX” test

So in telling us why ABX is bad, they cite the paper that says why we need ABX testing. I am sure they discuss the limitations of it. The authors (that is the FLAC is bad authors, not the authors of Standardizing Audio Tests), of course, have shown a complete misunderstanding of how research works, and probably don't get that you always talk about limitations of your testing procedures. That's just standard part of research (which pseudo-scientists wouldn't ever think about doing).

...most of the measurement variations you see in our results come from the mind wandering a bit during these listening sessions. Fortunately, most differences found were large enough to clearly pass statistical tests of significance....

Were the listeners' minds wandering, or the measurers'? That seems like a pretty big issue. "We measured a difference. Most was because we weren't paying attention."

They repeatedly talk about how and older version of their playback software didn't allocate enough memory, so if there is a difference, it could be resulting from a bug in that software. They noted that the differences were significantly reduced in the current version of the software. Which sounds like there was just a bug.

From what I have gathered. The person doing the measuring isn't blind? It's a different person than the listener, correct? Or is the listener eyeballing the tape measure? Either way, I'd really LOVE LOVE LOVE a video of this experiment, and would be willing to bet the person who knows which source it is provides cues. Like, stepping down a step-stool, starting to point places on the ruler.

I can tell you that their conclusions are wrong because, as was quoted earlier:

Much to our surprise we found that the derived WAV files exhibited a highly audible, hyperbolic decline in sound quality, as estimated on our subjective scale, despite measuring identical by standard null testing.

I mean, you just can't make this article up.

@bilboda How closely did you read this before you concluded that it "Seems like good science?"

bilboda · Jul 13, 2016 at 7:07 PM

You seem determined to debunk this without conducting your own test. They lay it out in such a way that anybody should be able to duplicate it. I'm just not up to doing it myself.
Much like I wasn't up to reading your entire post as you pretty much summed it up early on in the second line of your comments
"I love that these pseudo-scientific dingbats.... etc, etc "
I don't know if you expected a response, but that's all I got......
I'd repeat the test myself before I debunked it with pseudo logic, just saying...no offense intended.

reginalb · Jul 13, 2016 at 7:48 PM

bilboda said:
You seem determined to debunk this without conducting your own test. They lay it out in such a way that anybody should be able to duplicate it. I'm just not up to doing it myself.
Much like I wasn't up to reading your entire post as you pretty much summed it up early on in the second line of your comments
"I love that these pseudo-scientific dingbats.... etc, etc "
I don't know if you expected a response, but that's all I got......
I'd repeat the test myself before I debunked it with pseudo logic, just saying...no offense intended.

Did you read what I quoted? If you put the sentences that I quoted in your "research" writeup, then you're a psuedo-scientific dingbat. I didn't come up with that out of thin air. That was after reading it that I came to that conclusion, and cited evidence in my response.

VNandor · Jul 13, 2016 at 7:55 PM

bilboda said:
You seem determined to debunk this without conducting your own test

It is because they say they've run a null test and got a null. It means they are totally exactly the same signals in case they didn't add corrections to it but hopefully they didn't do that, why would they do (I mention it because for example Diffmaker allows to compensate with EQ, compensate for sample rate drifting etc.). No matter where the difference comes from it just can't come from WAV-FLAC converting depsite what the article suggests.

Latest Thread Images

VNandor

500+ Head-Fier

Uberclocked

1000+ Head-Fier

VNandor

500+ Head-Fier

bilboda

100+ Head-Fier

castleofargh

Sound Science Forum Moderator

nick_charles

Headphoneus Supremus

nick_charles

Headphoneus Supremus

bilboda

100+ Head-Fier

nick_charles

Headphoneus Supremus

wtaylorbasil

New Head-Fier

nick_charles

Headphoneus Supremus

reginalb

1000+ Head-Fier

bilboda

100+ Head-Fier

reginalb

1000+ Head-Fier

VNandor

500+ Head-Fier

Users who are viewing this thread