How To: Process High Resolution Music Files
Jul 8, 2015 at 5:28 PM Post #76 of 102
  Can anyone who can access the McGill paper confirm the findings as laid out here (post 9)?
http://www.sa-cd.net/showthread/58757/58908

 
I have a copy of the paper, yes that is the upshot, remember they used 2 tailed tests otherwise the 3 who were significantly wrong would have been not significant and by removing these right/wrong answers they are cherry picking and this alters the overall results which by my crude calculations would no longer have been significant for several of the tests where a significant result was found but the contradictory results where for instance 88.1 downsampled to 44.1 is not detected but 88.1 native vs 44.1 native is detected are puzzling to say the least...
 
The 13 normals
 

 
 
The three outliers
 

 
 
Logically
 
Native 88 (A)
88 to 44 (B)
Native 44 (C)
 
 
The results for the 13 with orchestral
A > C,            Native 88 "better" than native 44
 
But
A = B,            Native 88 same as 88 downsampled
C = B             Native 44 same as 88 downsampled
 
therefore A = C which contradicts A > C
 
Jul 8, 2015 at 5:46 PM Post #77 of 102
I have a copy of the paper, yes that is the upshot, remember they used 2 tailed tests otherwise the 3 who were significantly wrong would have been not significant and by removing these right/wrong answers they are cherry picking and this alters the overall results which by my crude calculations would no longer have been significant for several of the tests where a significant result was found but the contradictory results where for instance 88.1 downsampled to 44.1 is not detected but 88.1 native vs 44.1 native is detected are puzzling to say the least...


Wait, I didn't read it that way. They seem to be saying native 88.2 vs native 44.1 was not detected.

"However, no significant results were observed for the comparison between files recorded at 88.2 kHz and 44.1 kHz, p = .15."

Which would seem to indicate something funky going on with the software used to convert 88.2 to 44.1.

se
 
Jul 8, 2015 at 5:58 PM Post #78 of 102
Wait, I didn't read it that way. They seem to be saying native 88.2 vs native 44.1 was not detected.

"However, no significant results were observed for the comparison between files recorded at 88.2 kHz and 44.1 kHz, p = .15."

Which would seem to indicate something funky going on with the software used to convert 88.2 to 44.1.

se

 
For the 13
Regarding the comparison between files recorded at 88.2 kHz and 44.1 kHz, significant results were observed for the Orchestra excerpt only, p = .02.
 
For the 13
Regarding the comparison between files recorded at 44.1 kHz and files down-sampled to 44.1 kHz, no significant result was observed for any musical excerpt.
 
For the 13
Regarding the comparison between files recorded at 88.2 kHz and their down-sampled 44.1 kHz version, significant results were observed for the Classical Guitar and the Voice excerpts, p = .004, p = .04,
respectively.

 
 
the graphs summarize it better
 
I agree the results overall hint at something going awry
 
Jul 8, 2015 at 7:01 PM Post #79 of 102
Many thanks for the elaboration. Just highlights se's point about being careful to distinguish evidence from proof, especially when both procedures and statistics can go haywire.
 
Jul 9, 2015 at 4:59 AM Post #80 of 102
  Can anyone who can access the McGill paper confirm the findings as laid out here (post 9)?
http://www.sa-cd.net/showthread/58757/58908

 
The test is  considered as being inherently flawed in the AES community. That's one reason why it stopped its progress through the AES review process before becoming a journal paper.
 
Being an AES member I accepted the $5 charge and downloaded the paper. The first problem that popped out at me was that there was no validation of the linearity of the monitoring chain, particularly in the range > 20 KHz where audible problems are not uncommon. I'm continuing to study the paper.
 
Its sequel:   The Audibility of Typical Digital Audio Filters in a High-Fidelity Playback System has suffered a similar fate.
 
https://secure.aes.org/forum/pubs/conventions/?ID=416
 
The alleged tests by Amir at the WBF forum have similar problems.
 
If you would like to attempt a fairly well debugged version of this kind of test, link here: http://www.hydrogenaud.io/forums/index.php?showtopic=107570&view=findpost&p=894877
 
These files are designed for ABXing using FooBar2000 and its ABX plug in:
 
http://www.foobar2000.org/download
 
http://www.foobar2000.org/components/view/foo_abx
 
Jul 9, 2015 at 5:15 AM Post #81 of 102
  Many thanks for the elaboration. Just highlights se's point about being careful to distinguish evidence from proof, especially when both procedures and statistics can go haywire.

 
Nice discussion of the paper here -
 
http://www.sa-cd.net/showthread/58757/58908
 
Post by eesau September 8, 2010 (9 of 111)
 
that fits in pretty well with Steve's comments:
 
"
this is an interesting paper but the authors need to carry our further research before the results can be accepted as a scientific fact.

There are some interesting results in the paper:

Using orchestral music material, 13 participants out of 16 were statistically able to detect

+ native 88.4kHz and native 44.1kHz from each other

However, they were not able to detect 

- native 88.4kHz from the same signal down sampled to 44.1kHz 
- nor native 44.1kHz from the 88.4kHz down sampled to 44.1kHz

There existed no statistically relevant detection using ”cymbals” or ”violin” material.

With ”guitar” and ”voice”, these participants were able to tell 

+ native 88.4kHz from the same signal down sampled to 44.1kHz

but with this material, they were not able to tell

- native 88.4kHz and native 44.1kHz from each other
- nor native 44.1kHz from the 88.4kHz down sampled to 44.1kHz


3 participants out of 16 provided with reverse results that (may) indicate that they detected a difference but could not really tell which was which …

These participants doing detection in reverse were able to non-detect 

+ native 44.1kHz from the 88.4kHz down sampled to 44.1kHz using ”guitar” material (but nothing else statistically relevant)

and further with the ”violin” material they were able to non-detect both 

+ native 88.4kHz and native 44.1kHz from each other and
+ native 44.1kHz from the 88.4kHz down sampled to 44.1kHz

So, don't you think that the results are somewhat contradictory … and this, again, shows that AES is not very discriminative when accepting papers to their conventions.

24-bit 88.4kHz and 44.1kHz sample rates were used possibly because 24/88.4kHz is considered to be ”high resolution audio” and sample rate conversion is very straight forward. They could have used 48kHz and 96kHz as well but would you have been so interested in the results?

ABX is a scientific method to compare results and AES and McGill University are trying to be scientific. For some reason, ABX does not seem to work for audiophile use ….

All participants in the tests had musical training and a professional or scientific relation to digital audio with an average age of 30 years. They were describing high definition audio as having a better spatial reproduction, high frequency richness, precision or fullness … 
"
 
Other comments by me:
 
Both the McGill and Meridian papers claim to have used ABX tests, but these were not the ABX tests that are commonly discussed on audiophile forums.
 
"Participants were asked to perform a double blind
ABX task. For each trial, the excerpt was presented
with three versions, namely A, B and the reference X.
A and B always differ. X is always either the same as
A or the same as B. The participant’s task is to
indicate whether X = A or X = B. To nullify order
effects, the order of presentation across trials and
blocks was randomized."
 
In fact the ABX tests that are commonly discussed on audiophile forums are run quite differently:
 
For each trial listeners are allowed to freely select from 3 versions namely A, B, and X and listen to them as many times as they wish in what ever order they wish to reach any conclusions they wish to reach about them. The listener's conclusion that X sounds most like A or B or sounds least like A or B is recorded and the test moves to the next trial. 
 
Thus the ABX test that these experimenters used forced more constraints on the listeners, and may have provided less reliable results than might be possible.
 
The benchmark paper that these newer AES conference papers want to criticize is Meyer and Moran's JAES paper http://www.aes.org/e-lib/browse.cfm?elib=14195 that did make the final cut. Unfortunately it had an inhrerent flaw that was not the author's fault or even widely recognized until some years later. The problem was at the time they did their research a very high proportion of DVD-A and SACD discs that were on the market, including many that they used in their tests, were sourced from legacy analog and CD-quality or thereabouts digital masters. Therefore the presumption that the SACD and DVD recordings were all high rez was probably not true.
 
Jul 9, 2015 at 5:54 AM Post #82 of 102
 
I agree the results overall hint at something going awry

 
The really big question is what happened to the oft-repeated claim of "Mind blowing differences"?
 
The highest percentage correct I see in the paper was 70% with almost half of the results at or below 50% AKA Random Guessing.
 
Jul 9, 2015 at 7:59 AM Post #83 of 102
   
The really big question is what happened to the oft-repeated claim of "Mind blowing differences"?
 
The highest percentage correct I see in the paper was 70% with almost half of the results at or below 50% AKA Random Guessing.

 
The "night-and-day" thing irks me too. A while back, I spent a day doing some how-low-can-you-go tests on some of my most and least dynamic tracks. I could get the more dynamic stuff down to 14/38 and not find any parts of the track that I could successfully ABX, let alone hear obvious differences. The non-dynamic stuff could get down to, in some cases, 8/38 without any problems. And some of this stuff I've been listening to for decades now.
 
Jul 9, 2015 at 2:16 PM Post #84 of 102
All the testing trying to find audible differences between digital formats reveals one thing for sure :
It is pretty darn difficult* for most participants to get 100% of the answers right.
 
* = impossible
 
Ergo: there is no "night and day" difference if the different formats are based on the same master.
Most cases of obvious differences are due to different mastering/processing/manufacturing but not inheritly due to the format itself.
 
I have K2 remaster version of Martha Argerichs Rachmaninov #3 & Tchaikovsky #1 piano concerto which sounds sublime.
The regular CD version is crappy in comparision. But both are redbook CD standard.
Lot's of CD's sound as good as live and that's good enough in my book.
 
If somebody wants to make a high rez file sound better and all possible care is taken that the result will sound astonishing then great, congratulations on the effort and maybe even on the result. But that result most likely could have also been achieved in a remastered CD (16/44.1) version given the exact same effort. Just don't expect me to buy a huge collection of high rez versions for 3 to 4x the money of albums that I already have and that I enjoy. Marketing always needs to promote the next big thing and if they don't really have it, they make it up.
rolleyes.gif

 
Past a certain point it gets really difficult to see any improvement with the naked eye under real world conditions. Yes, with certain well choosen examples under certain test conditions, there might be some differences that can be pointed out but for 95% of consumers these differences will not matter any more.
 
Jul 9, 2015 at 9:51 PM Post #85 of 102
ABX tests test memory, not sound quality.
ABX tests force an unnatural choice in an unnatural environment.
Our ear-brain does not react favorably to ABX tests.
 
That's why they are garbage for sound quality.  It hasn't stopped them from owning the world of sound arguments.
 
Which is why sound as a whole is so crap right now.  The ABX test and it's garbage results have caused this.
 
Jul 9, 2015 at 9:54 PM Post #86 of 102
  ABX tests test memory, not sound quality.
ABX tests force an unnatural choice in an unnatural environment.
Our ear-brain does not react favorably to ABX tests.
 
That's why they are garbage for sound quality.  It hasn't stopped them from owning the world of sound arguments.
 
Which is why sound as a whole is so crap right now.  The ABX test and it's garbage results have caused this.

 
Sound is fine in the classical world, actually. What kind of blind test would you think helpful, or do you not believe in bias?
 
Jul 9, 2015 at 9:56 PM Post #87 of 102
   
It has already been proven by science. There cannot possibly be a difference unless the system used does not properly play the files.
 
https://xiph.org/~xiphmont/demo/neil-young.html


Science?   Haha.
 
Your belief that the science of the human senses is settled is incorrect and highly flawed.
 
Science as whole isn't about being settled, it's about discovery.
 
The science of our human senses, in particular, is very much unfinished.  Claiming you can finely measure the abilities of the human brain is foolish.
 
If science was "finished" some of us would have robot girlfriends and robots could mimic our kid's voices.  But they can't and who knows if they ever will.
 
Here's science, explained by someone smarter than all of us, who spent his whole life in the field:
 
“The whole point of science is that most of it is uncertain. That’s why science is exciting–because we don’t know. Science is all about things we don’t understand. The public, of course, imagines science is just a set of facts. But it’s not. Science is a process of exploring, which is always partial. We explore, and we find out things that we understand. We find out things we thought we understood were wrong. That’s how it makes progress.” – Freeman Dyson, 90, Mathematical Physicist

 
Jul 9, 2015 at 10:03 PM Post #88 of 102
   
Sound is fine in the classical world, actually. What kind of blind test would you think helpful, or do you not believe in bias?


I'm working on developing a listening test that will provide a much clearer picture of sound enjoyment, interaction, and yes, quality of program.   It has some basic elements of the current ABX test but also some major changes that I hope will address the issues with ABX. 
 
I'm just a guy with a job that is not that, so who knows if it will get any traction. But I have been patiently waiting for "science" to realize what a disaster that test form is for this problem for over 2 decades now, and no one seems to be working on this.  If you want it done, do it yourself, ya know?
 
I program tests that others have designed in various fields, so I'm familiar with the basic mechanics of testing and analyzing results. 
 
I do believe in bias so I am trying to work traps for that into my testing.
 
If you are interested in this, I posted some initial thoughts here: http://wfnk.com/blog/2015/06/new-listening-test-a-proposal/
 
Jul 9, 2015 at 10:14 PM Post #89 of 102
  Science?   Haha.
 
Your belief that the science of the human senses is settled is incorrect and highly flawed.
 
Science as whole isn't about being settled, it's about discovery.
 
The science of our human senses, in particular, is very much unfinished.  Claiming you can finely measure the abilities of the human brain is foolish.
 
If science was "finished" some of us would have robot girlfriends and robots could mimic our kid's voices.  But they can't and who knows if they ever will.
 
Here's science, explained by someone smarter than all of us, who spent his whole life in the field:

 
What I am saying is that the following is already known, beyond doubt:
 
  1. The human hearing range is generally 20 Hz to 20 kHz. Frequencies well above that are inaudible.
  2. 16-bit has more than enough dynamic range to reproduce all recordings in existence. The extra dynamic range of 24-bit adds nothing audible.
 
Due to these facts, hi-res has no conceivable benefit in terms of audio playback. I suggest reading the article I linked to.
 
Jul 9, 2015 at 10:20 PM Post #90 of 102
 
I'm working on developing a listening test that will provide a much clearer picture of sound enjoyment, interaction, and yes, quality of program.   It has some basic elements of the current ABX test but also some major changes that I hope will address the issues with ABX. 
 
I'm just a guy with a job that is not that, so who knows if it will get any traction. But I have been patiently waiting for "science" to realize what a disaster that test form is for this problem for over 2 decades now, and no one seems to be working on this.  If you want it done, do it yourself, ya know?
 
I program tests that others have designed in various fields, so I'm familiar with the basic mechanics of testing and analyzing results. 
 
I do believe in bias so I am trying to work traps for that into my testing.
 
If you are interested in this, I posted some initial thoughts here: http://wfnk.com/blog/2015/06/new-listening-test-a-proposal/

 
I'll give it a read, thanks.
 

Users who are viewing this thread

Back
Top