The Subconscious Case for HD Audio
May 15, 2023 at 11:15 AM Thread Starter Post #1 of 57

Bret Halford

500+ Head-Fier
Joined
May 2, 2022
Posts
874
Likes
3,686
Location
USA
Hello brave Head Fi Scientists! I would like to acknowledge the consistent efforts from the core group in this particular sub forum in pushing for objective standards here and particularly excellent discussion on the virtue of double blind tests for audio.

Double blind testing is a critical tool in evaluating subjective claims like we see thrown around in audiophillia constantly. Being able to take such a test and evaluate your own listening objectively is a very convincing experience, however it's important to note that double blind testing only accounts for discernible, conscious differences. However there are undeniably a variety of phenomena that affect the human body in consistent, objective ways that are not perceptible to us.

For instance, if you were to run a (necessarily quick) double blind test on subjects to see if they could tell the difference between oxygen and carbon monoxide, the subjects would not be aware of any difference due to the lack of odor, and yet the carbon monoxide would kill them after a short while. It's an extreme scenario, but hopefully you get the point.

Why does it matter? Because our senses are routed first through the lower 'survival' brain for critical evaluations like fight or flight before the conscious upper brain is even made aware of the detection. In evaluating audio capabilities of the human hearing system then, are we artificially constraining results by focusing only on conscious perception?

I would like to invite your consideration...

The Subconscious Case for HD Audio

You hear a lot about cables, amps and DACs having subtle 'unmeasurable' effects on sound in the forum proper and other similar subjective audiophile communities. Less popular (at least these days) are discussions around lossy vs. lossless and even rarer folks claiming benefits from so called 'HD audio' (for the purposes of this discussion, I'll take this meaning as >48 kHz sampling, not going to discuss quantization and 24 bit at all).

Part of this dismissal for the case of HD Audio and even lossy vs. lossless CD (Redbook) quality stems from the fact that it is fairly easy to do online double blind tests that toggle seamlessly back and forth between qualities and even offer 'tests' to gauge your ability to ability to identify them correctly. These tests are exhausting but highly convincing... For instance while I can discern lossy vs. lossless most of the time (slightly over 75% across tests), I can't statistically tell the difference between SD and HD audio myself. Indeed HD audio has consistently failed double blind experiments, whether it's SACD, HDCD, DSD, MQA (lol), or even 192/24 bit FLAC. I think this removes a lot of the speculative room in the hobby for people to claim audible improvements... it's much harder to properly double blind test things like cables or sources that require (blinded) helpers and precise level matching etc.

So that is all tidy and nice for an objectivist leaning listener such as myself, right? Wrong. There exists objective data that not only humans not only hear the difference in music sampled above 44 kHz, and that we enjoy it more too! We just perceive these differences subconsciously, making listening tests largely invalid. If we instead look at electroencephalogram (EEG) data of listeners exposed to SD and HD music while engaged in an unrelated intellectual task, there are observable differences in the human neurological response:

Although the effect size is small, the overall results support the view that the effect of high-resolution audio with inaudible high-frequency components on brain activity reflects a relaxed attentional state without conscious awareness.

We found greater high-alpha (10.5–13 Hz) and low-beta (13–20 Hz) EEG powers for the excerpt with high-frequency components as compared with the excerpt without them. The effect appeared in the latter half of the listening period (200-400 s) and during the 100-s period after music presentation (post-music epoch).

https://www.frontiersin.org/article...attentional state without conscious awareness.

These isn't the first study on HD effects either, it builds on prior work done. A series of studies have shown measurable differences in the brain response when HD content is present in music: Oohashi et al., 2000, 2006; Yagi et al., 2003a; Fukushima et al., 2014; Kuribayashi et al., 2014; Ito et al., 2016. It's expected to see increased alpha brainwaves (arousal!) when there are 'inaudible' HD frequency components.

Interesting take aways from this more recent study:

-The effect takes a while to kick in, ~200s, so rapid switching A/B inherently is a non-starter!
-It lasts ~100s after the test too
-Playing just the >20kHz HD spectrum (without the music) didn't produce the same effect
-There were still no statistically meaningful differences between the subjective ratings of the SD / HD pieces under a forced choice condition, except for 'natural':

A link between alpha power and ratings of ‘naturalness’ of music has been reported. When listening to the same musical piece with different tempos, alpha-band EEG power increased for excerpts that were rated to be more natural, the ratings of which were not directly related to subjective arousal (Ma et al., 2012; Tian et al., 2013). As high-resolution audio replicates real sound waves more closely, it may sound more natural (at least on a subconscious level) and facilitate music-related psychophysiological responses.

So maybe rather than rapid A/B testing, we should have subjects listen to several minutes of audio and simply ask which sounds more natural rather than high resolution, better etc... indeed the speculation around mechanisms is basically just 'sounds more natural' hand waving.

Here are the fun EEG plots:

fpsyg-08-00093-g001.jpg



In particular, note the difference post listening in the posterier, right brain activity. Full range subjects have 1/3 of a uV more activity at ~12Hz.

fpsyg-08-00093-g002.jpg



The top plots show the Integrative effect. Note how alpha waves (arousal pleasure) and slightly less so beta waves (vigilance) pull ahead the longer the listener is exposed to HD audio. Just think of how much more productive and pleasurable my life is after listening to HD music for hours :p

On the Importance of Timing

Us Electrical Engineers tend to think of everything in the frequency domain as that is the convenient design and analysis space for electronics, but our hearing isn't a radio and we don't hear in the frequency domain directly. We perceive things transiently and must consider the biological implementation and motivation of the human ear.

If you think of our ear as a reverse headphone, the eardrum (timpanic membrane) is like the driver diaphragm and behind it the cochlea is like the DAC, transducing the physical vibration recieved into bioelectric neurological signals. The cochlea itself is a spiral containing a series of delicate cochlear hair strands that pickup the sound running past them. These hairs have very fine spacing, and because they are aligned linearly along the length of the cochlea, the brain has physical access to excellent timing data.

structures-outer-ear.jpg


The cochlea is a folded up length of sensors displaced slightly from one another, providing extremely high resolution timing data on an incident pulse.

Indeed, we can similarly find objective evidence of human hearing WAY above 20 kHz by focusing on timing differences rather than music or other 'informational' signals. A few years back I came across this nice study looking at identifiable time offsets between a pair of ribbon speakers: http://boson.physics.sc.edu/~kunchu...isalignment-of-acoustic-signals---Kunchur.pdf

In the study a range of participants (including several in their mid/late 40s!) were sat in front of a pair of aligned ribbon speakers that played a steady tone. A series of tests were done that displaced one of the ribbon speakers by a slight offset in distance from the subject:

1683653879008.png



The results are absolutely stunning:

1683653951244.png



All subjects guessed correctly 10/10 times for displacements as small as ~3mm!

The shortest displacement that could be readily discerned was 2.3 mm, which corresponds to a delay of τ < 6.7 μs. For this, combining all subjects, there were 82% correct judgements, a chi-squared analysis value of χ2 = 20.48, and a signal-detection-theory (SDT) discriminability index of d = 1.84 with a criterion of c = 0.97. For d=2.0 mm, there was essentially no discernment between the displaced and control sounds (52% correct judgements, χ2 = 0.08, d = 0.14, and c = 0.18).

Using a chi-squared approach, this puts the threshold of judgement vs. chance at around 2.3 mm, corresponding to a time delay of less than 6.7 us.

If we examine the corresponding max frequency that would be required to capture a 6.7 us period signal without aliasing:

Fsample = 2 * 1/(6.7x10^-6)=3.0 x10^5 Hz, or about 300 kHz, that's nearly an order of magnitude more than the 44kHz sampling assumption used by Redbook standard, and represents a max hearing 'frequency' of ~150kHz, not 20kHz as is commonly assumed!

It's particularly interesting for us here in IEM world, as this paper does an excellent job noting the inherent disadvantage of broad firing speakers:

For example, Eq. 2 indicates that a dipole loudspeaker with a single electrostatic panel of height a=1.5 m at a speaker-listener distance of D=5 m (with the listener’s ear at half speaker height) will have a temporal spread of a2/2cD = 0.65 ms. What this means is that even if the entire chain had an otherwise unlimited bandwidth, a delta-function (narrow impulse) input signal will get spread out over a 650 μs long rectangular window at the listener position. Thus a loudspeaker that subtends a large angle at the listener position must necessarily compromise fidelity, perhaps explaining why small speakers tend to have a subjectively cleaner and more coherent sound

The transient smearing one gets listening to speakers is far and away the bottleneck compared to human hearing, and we can't speed up the sound waves to compensate - this is an inherent physical limitation for speakers that keep them below the fidelity possible for the ear. IEMs are the ideal solution to this challenge! They and headphones are probably the only form factor with a remote chance of relaying this timing info in real world systems/environments...

So given an exceptional, close, lab grade transducer and chain, we still have room to grow even past 192 kHz sampling in terms of transient audibility... this makes a lot of sense if we think in evolutionary terms rather than audiophile:

Our senses are designed to tell us about opportunity and threat in our environment. Spatial offset information allows our brain to extrapolate positional data about the source but also speed information too (displacement vs. time). It's not hard to understand that those able to better hear where that tiger is coming from would more often survive to reproduce... the compounded evolutionary effect is our ears are spatial specialists, capable of 'super human' data extraction in this space far beyond what we would expect from pleasurable signals like music.

As noted in the intro, if we examine the human brain structure, we find it's actually 2 brains in bunk beads. The lower 'lizard' brain is primordial and instinctive, and handles our basic regulatory and survival functions. It makes sense that all sensory data is first run by the lizard. If you a Tiger is about to pounce on you, the luxury of time it takes to get the neocortex involved and in approval will mean you're cat lunch... It makes total sense then that it would be far more effective at evaluating timing info then we are consciously aware with our neocortex.

Conclusion

It's ironic given the historical dismissal, that the perhaps the best arguments for HD audio benefits come not from subjectivists, but rather objective studies. This is a natural product of the fact that for musical info, most if not all of the observable benefit is subconscious. If we look to nature for an explanation, we find a very tidy biological explanation grounded in evolutionary theory, and in particular the ear's ability to perceive incredibly minute timing differences.

The end result? You're not crazy for paying extra for Qobuz, and you're definitely not crazy if you prefer IEMs and headphones to 2 channel and surround!
 
May 15, 2023 at 11:58 AM Post #2 of 57
If we examine the corresponding max frequency that would be required to capture a 6.7 us period signal without aliasing:
Capturing 6.7 µs period signal is unrelated to capturing 6.7 µs delay. The standard 16/44 format is perfectly adequate to capture not only µs delay but also ns and even ps. From https://troll-audio.com/articles/time-resolution-of-digital-audio/
With CD quality audio, 16 bits at 44.1 kHz, the best-case time resolution is obtained with a full-scale signal at 22.05 kHz. The above formula then yields ... 115 ps.
For a more typical 1 kHz signal at -20 dB, i.e. with an amplitude of 0.1, the same calculation produces a value of 24 ns.

For some actual example:
https://www.head-fi.org/threads/why...t-bad-for-music.716822/page-187#post-16783791
 
Last edited:
May 15, 2023 at 12:34 PM Post #3 of 57
Capturing 6.7 µs period signal is unrelated to capturing 6.7 µs delay. The standard 16/44 format is perfectly adequate to capture not only µs delay but also ns and even ps. From https://troll-audio.com/articles/time-resolution-of-digital-audio/


For some actual example:
https://www.head-fi.org/threads/why...t-bad-for-music.716822/page-187#post-16783791
Interesting distinction, but seems like a semantic plea. Do you honestly think that the human brain can detect timing to that level but is totally oblivious as soon as it becomes periodic?

Either way, what's your explanation for the consistent, measurable differences in brain activity shown in the half dozen studies linked? Coincidence?
 
Last edited:
May 15, 2023 at 12:35 PM Post #4 of 57
What I took away from this is the human ear is highly sensitive to phase or timing delays. I guessed this probably helps you form a mental map of the sound sources in an environment. So the question is it better to have a flat phase tuning or something that is very not flat.
 
May 15, 2023 at 3:17 PM Post #5 of 57
I am 100 % happy with redbook. That's great, because I have about 2000 CDs to enjoy. Other people can spend their money on hi-res audio if they believe that increases enjoyment (for me better music, production, mixing and mastering would do that, but hey that's me).
 
Last edited:
May 15, 2023 at 3:38 PM Post #6 of 57
I am 100 % happy with redbook. That's great, because I have about 2000 CDs to enjoy. Other people can spend their money on hi-res audio if they believe that increases enjoyment (for me better music, production, mixing and mastering would do that, but hey that's me).

Hello Sunk Cost Fallacy, how are you today? Coping well?

In my opinion Bret Halford makes totally wrong conclusion and doesn't even seem to understand the temporal resolution of digital audio in order to create lacklustre "scientific" justification for hi-res, but hey, that's me and what do I know? Nothing I guess...

I believe you mean Oohashi et al., 2000, 2006; Yagi et al., 2003a; Fukushima et al., 2014; Kuribayashi et al., 2014; Ito et al., 2016 drew the wrong conclusions. It's cute you guys think we're just worried about a 20kHz sinusoid at cross point out of phase... but maybe save the ad hominems for after you've fielded an explanation for the data?
 
May 15, 2023 at 4:01 PM Post #7 of 57
Hello Sunk Cost Fallacy, how are you today? Coping well?
I'm having a bad day. I'm trying to make it better by watching Dan Bell.
I believe you mean Oohashi et al., 2000, 2006; Yagi et al., 2003a; Fukushima et al., 2014; Kuribayashi et al., 2014; Ito et al., 2016 drew the wrong conclusions. It's cute you guys think we're just worried about a 20kHz sinusoid at cross point out of phase... but maybe save the ad hominems for after you've fielded an explanation for the data?
I need time to investigate this data. Maybe the data is a Japanese lie, maybe it is not. Can't say at this point. I deleted a part of my post to make it more friendly.
 
May 15, 2023 at 4:49 PM Post #8 of 57
I need time to investigate this data. Maybe the data is a Japanese lie, maybe it is not. Can't say at this point. I deleted a part of my post to make it more friendly.
Too bad I already read it and replied to your dismissal, huh... nobody rushed you!

What is it called when you rush to judgement on something before investigating the data?
 
May 15, 2023 at 5:54 PM Post #9 of 57
Hello brave Head Fi Scientists! I would like to acknowledge the consistent efforts from the core group in this particular sub forum in pushing for objective standards here and particularly excellent discussion on the virtue of double blind tests for audio.

Double blind testing is a critical tool in evaluating subjective claims like we see thrown around in audiophillia constantly. Being able to take such a test and evaluate your own listening objectively is a very convincing experience, however it's important to note that double blind testing only accounts for discernible, conscious differences. However there are undeniably a variety of phenomena that affect the human body in consistent, objective ways that are not perceptible to us.

For instance, if you were to run a (necessarily quick) double blind test on subjects to see if they could tell the difference between oxygen and carbon monoxide, the subjects would not be aware of any difference due to the lack of odor, and yet the carbon monoxide would kill them after a short while. It's an extreme scenario, but hopefully you get the point.

Why does it matter? Because our senses are routed first through the lower 'survival' brain for critical evaluations like fight or flight before the conscious upper brain is even made aware of the detection. In evaluating audio capabilities of the human hearing system then, are we artificially constraining results by focusing only on conscious perception?

I would like to invite your consideration...

The Subconscious Case for HD Audio

You hear a lot about cables, amps and DACs having subtle 'unmeasurable' effects on sound in the forum proper and other similar subjective audiophile communities. Less popular (at least these days) are discussions around lossy vs. lossless and even rarer folks claiming benefits from so called 'HD audio' (for the purposes of this discussion, I'll take this meaning as >48 kHz sampling, not going to discuss quantization and 24 bit at all).

Part of this dismissal for the case of HD Audio and even lossy vs. lossless CD (Redbook) quality stems from the fact that it is fairly easy to do online double blind tests that toggle seamlessly back and forth between qualities and even offer 'tests' to gauge your ability to ability to identify them correctly. These tests are exhausting but highly convincing... For instance while I can discern lossy vs. lossless most of the time (slightly over 75% across tests), I can't statistically tell the difference between SD and HD audio myself. Indeed HD audio has consistently failed double blind experiments, whether it's SACD, HDCD, DSD, MQA (lol), or even 192/24 bit FLAC. I think this removes a lot of the speculative room in the hobby for people to claim audible improvements... it's much harder to properly double blind test things like cables or sources that require (blinded) helpers and precise level matching etc.

So that is all tidy and nice for an objectivist leaning listener such as myself, right? Wrong. There exists objective data that not only humans not only hear the difference in music sampled above 44 kHz, and that we enjoy it more too! We just perceive these differences subconsciously, making listening tests largely invalid. If we instead look at electroencephalogram (EEG) data of listeners exposed to SD and HD music while engaged in an unrelated intellectual task, there are observable differences in the human neurological response:



https://www.frontiersin.org/article...attentional state without conscious awareness.

These isn't the first study on HD effects either, it builds on prior work done. A series of studies have shown measurable differences in the brain response when HD content is present in music: Oohashi et al., 2000, 2006; Yagi et al., 2003a; Fukushima et al., 2014; Kuribayashi et al., 2014; Ito et al., 2016. It's expected to see increased alpha brainwaves (arousal!) when there are 'inaudible' HD frequency components.

Interesting take aways from this more recent study:

-The effect takes a while to kick in, ~200s, so rapid switching A/B inherently is a non-starter!
-It lasts ~100s after the test too
-Playing just the >20kHz HD spectrum (without the music) didn't produce the same effect
-There were still no statistically meaningful differences between the subjective ratings of the SD / HD pieces under a forced choice condition, except for 'natural':



So maybe rather than rapid A/B testing, we should have subjects listen to several minutes of audio and simply ask which sounds more natural rather than high resolution, better etc... indeed the speculation around mechanisms is basically just 'sounds more natural' hand waving.

Here are the fun EEG plots:

fpsyg-08-00093-g001.jpg



In particular, note the difference post listening in the posterier, right brain activity. Full range subjects have 1/3 of a uV more activity at ~12Hz.

fpsyg-08-00093-g002.jpg



The top plots show the Integrative effect. Note how alpha waves (arousal pleasure) and slightly less so beta waves (vigilance) pull ahead the longer the listener is exposed to HD audio. Just think of how much more productive and pleasurable my life is after listening to HD music for hours :p

On the Importance of Timing

Us Electrical Engineers tend to think of everything in the frequency domain as that is the convenient design and analysis space for electronics, but our hearing isn't a radio and we don't hear in the frequency domain directly. We perceive things transiently and must consider the biological implementation and motivation of the human ear.

If you think of our ear as a reverse headphone, the eardrum (timpanic membrane) is like the driver diaphragm and behind it the cochlea is like the DAC, transducing the physical vibration recieved into bioelectric neurological signals. The cochlea itself is a spiral containing a series of delicate cochlear hair strands that pickup the sound running past them. These hairs have very fine spacing, and because they are aligned linearly along the length of the cochlea, the brain has physical access to excellent timing data.

structures-outer-ear.jpg


The cochlea is a folded up length of sensors displaced slightly from one another, providing extremely high resolution timing data on an incident pulse.

Indeed, we can similarly find objective evidence of human hearing WAY above 20 kHz by focusing on timing differences rather than music or other 'informational' signals. A few years back I came across this nice study looking at identifiable time offsets between a pair of ribbon speakers: http://boson.physics.sc.edu/~kunchu...isalignment-of-acoustic-signals---Kunchur.pdf

In the study a range of participants (including several in their mid/late 40s!) were sat in front of a pair of aligned ribbon speakers that played a steady tone. A series of tests were done that displaced one of the ribbon speakers by a slight offset in distance from the subject:

1683653879008.png



The results are absolutely stunning:

1683653951244.png



All subjects guessed correctly 10/10 times for displacements as small as ~3mm!



Using a chi-squared approach, this puts the threshold of judgement vs. chance at around 2.3 mm, corresponding to a time delay of less than 6.7 us.

If we examine the corresponding max frequency that would be required to capture a 6.7 us period signal without aliasing:

Fsample = 2 * 1/(6.7x10^-6)=3.0 x10^5 Hz, or about 300 kHz, that's nearly an order of magnitude more than the 44kHz sampling assumption used by Redbook standard, and represents a max hearing 'frequency' of ~150kHz, not 20kHz as is commonly assumed!

It's particularly interesting for us here in IEM world, as this paper does an excellent job noting the inherent disadvantage of broad firing speakers:



The transient smearing one gets listening to speakers is far and away the bottleneck compared to human hearing, and we can't speed up the sound waves to compensate - this is an inherent physical limitation for speakers that keep them below the fidelity possible for the ear. IEMs are the ideal solution to this challenge! They and headphones are probably the only form factor with a remote chance of relaying this timing info in real world systems/environments...

So given an exceptional, close, lab grade transducer and chain, we still have room to grow even past 192 kHz sampling in terms of transient audibility... this makes a lot of sense if we think in evolutionary terms rather than audiophile:

Our senses are designed to tell us about opportunity and threat in our environment. Spatial offset information allows our brain to extrapolate positional data about the source but also speed information too (displacement vs. time). It's not hard to understand that those able to better hear where that tiger is coming from would more often survive to reproduce... the compounded evolutionary effect is our ears are spatial specialists, capable of 'super human' data extraction in this space far beyond what we would expect from pleasurable signals like music.

As noted in the intro, if we examine the human brain structure, we find it's actually 2 brains in bunk beads. The lower 'lizard' brain is primordial and instinctive, and handles our basic regulatory and survival functions. It makes sense that all sensory data is first run by the lizard. If you a Tiger is about to pounce on you, the luxury of time it takes to get the neocortex involved and in approval will mean you're cat lunch... It makes total sense then that it would be far more effective at evaluating timing info then we are consciously aware with our neocortex.

Conclusion

It's ironic given the historical dismissal, that the perhaps the best arguments for HD audio benefits come not from subjectivists, but rather objective studies. This is a natural product of the fact that for musical info, most if not all of the observable benefit is subconscious. If we look to nature for an explanation, we find a very tidy biological explanation grounded in evolutionary theory, and in particular the ear's ability to perceive incredibly minute timing differences.

The end result? You're not crazy for paying extra for Qobuz, and you're definitely not crazy if you prefer IEMs and headphones to 2 channel and surround!

What’s happened in the field in the last six years-plus since the last study was published?
 
Last edited:
May 15, 2023 at 6:06 PM Post #10 of 57
Too bad I already read it and replied to your dismissal, huh... nobody rushed you!

What is it called when you rush to judgement on something before investigating the data?
I haven't investigated THIS data, but I have investigated other data according to which hi-res in not needed in consumer audio.
 
May 15, 2023 at 7:08 PM Post #11 of 57
Why do you cherry pick studies that have largely been discredited while ignoring those that stand the test of time to support magical thinking claims?

I'm not convinced that you are not trolling.
 
May 15, 2023 at 7:43 PM Post #12 of 57
Capturing 6.7 µs period signal is unrelated to capturing 6.7 µs delay. The standard 16/44 format is perfectly adequate to capture not only µs delay but also ns and even ps. From https://troll-audio.com/articles/time-resolution-of-digital-audio/


For some actual example:
https://www.head-fi.org/threads/why...t-bad-for-music.716822/page-187#post-16783791
It's much worse, the paper uses 2 speakers not placed like usual, but side by side basically touching each other, then one is moved forward while playing a specific square wave signal, and oh miracle, listeners notice.
That image really is the experiment

1683653879008.png


FIG. 1: Experimental configuration. Speaker-to-listener dis-
tance D=4.3 m, aperture length a=1.5 cm, and speaker-center
to speaker-center distance a + 2b = 9.9 cm
. Misalignment
offset d is variable. During blind trials, a listener tries to
distinguish between the aligned (d=0) and misaligned (d=0)
settings for d values ranging 2–10 mm (τ ∼6–30 μs).

That paper has absolutely no business being in this thread, as that delay cannot be linked to anything in a normal music listening situation. OP misunderstood it big time before misusing the value as you mentioned. Not that I'm a giant fan of the other papers and the interpretations made of them, but at least they are related to the thread's title.
 
May 15, 2023 at 8:10 PM Post #13 of 57
I see we've moved on to attacking the source!

Why do you cherry pick studies that have largely been discredited while ignoring those that stand the test of time to support magical thinking claims?

I'm not convinced that you are not trolling.

Please enlighten me! I had no idea these peer reviewed publications were so unanimously understood to be discredited!

It's much worse, the paper uses 2 speakers not placed like usual, but side by side basically touching each other, then one is moved forward while playing a specific square wave signal, and oh miracle, listeners notice.
That image really is the experiment

1683653879008.png




That paper has absolutely no business being in this thread, as that delay cannot be linked to anything in a normal music listening situation. OP misunderstood it big time before misusing the value as you mentioned. Not that I'm a giant fan of the other papers and the interpretations made of them, but at least they are related to the thread's title.
A 'normal listening situation' varies a rather lot. For me it's a CIEM that sits a few mm away from my ear drum. As the paper points out, broader firing speakers smear the timing far more than the audible limit... that's not a limit of the human ear, it's a limit of your rig!

I included that link as it shows some pretty crazy time audibility and speculated that might come into play in explaining the EEG results so consistently... Far more substance than the rebuttals so far! If you take the time to read the post this is pretty clear. Now, what exactly have I misunderstood?

And no, I don't really care about allowable phase variance on a 20k sinusoid, it doesn't relate to max audibility in humans, nor does it somehow circumvent Shanon's law. If you want to talk about 'misuse' that one is WAY further off topic lol.
 
May 15, 2023 at 9:05 PM Post #14 of 57
Hello brave Head Fi Scientists! I would like to acknowledge the consistent efforts from the core group in this particular sub forum in pushing for objective standards here and particularly excellent discussion on the virtue of double blind tests for audio.

Double blind testing is a critical tool in evaluating subjective claims like we see thrown around in audiophillia constantly. Being able to take such a test and evaluate your own listening objectively is a very convincing experience, however it's important to note that double blind testing only accounts for discernible, conscious differences. However there are undeniably a variety of phenomena that affect the human body in consistent, objective ways that are not perceptible to us.

For instance, if you were to run a (necessarily quick) double blind test on subjects to see if they could tell the difference between oxygen and carbon monoxide, the subjects would not be aware of any difference due to the lack of odor, and yet the carbon monoxide would kill them after a short while. It's an extreme scenario, but hopefully you get the point.

Why does it matter? Because our senses are routed first through the lower 'survival' brain for critical evaluations like fight or flight before the conscious upper brain is even made aware of the detection. In evaluating audio capabilities of the human hearing system then, are we artificially constraining results by focusing only on conscious perception?

I would like to invite your consideration...

The Subconscious Case for HD Audio

You hear a lot about cables, amps and DACs having subtle 'unmeasurable' effects on sound in the forum proper and other similar subjective audiophile communities. Less popular (at least these days) are discussions around lossy vs. lossless and even rarer folks claiming benefits from so called 'HD audio' (for the purposes of this discussion, I'll take this meaning as >48 kHz sampling, not going to discuss quantization and 24 bit at all).

Part of this dismissal for the case of HD Audio and even lossy vs. lossless CD (Redbook) quality stems from the fact that it is fairly easy to do online double blind tests that toggle seamlessly back and forth between qualities and even offer 'tests' to gauge your ability to ability to identify them correctly. These tests are exhausting but highly convincing... For instance while I can discern lossy vs. lossless most of the time (slightly over 75% across tests), I can't statistically tell the difference between SD and HD audio myself. Indeed HD audio has consistently failed double blind experiments, whether it's SACD, HDCD, DSD, MQA (lol), or even 192/24 bit FLAC. I think this removes a lot of the speculative room in the hobby for people to claim audible improvements... it's much harder to properly double blind test things like cables or sources that require (blinded) helpers and precise level matching etc.

So that is all tidy and nice for an objectivist leaning listener such as myself, right? Wrong. There exists objective data that not only humans not only hear the difference in music sampled above 44 kHz, and that we enjoy it more too! We just perceive these differences subconsciously, making listening tests largely invalid. If we instead look at electroencephalogram (EEG) data of listeners exposed to SD and HD music while engaged in an unrelated intellectual task, there are observable differences in the human neurological response:



https://www.frontiersin.org/article...attentional state without conscious awareness.

These isn't the first study on HD effects either, it builds on prior work done. A series of studies have shown measurable differences in the brain response when HD content is present in music: Oohashi et al., 2000, 2006; Yagi et al., 2003a; Fukushima et al., 2014; Kuribayashi et al., 2014; Ito et al., 2016. It's expected to see increased alpha brainwaves (arousal!) when there are 'inaudible' HD frequency components.

Interesting take aways from this more recent study:

-The effect takes a while to kick in, ~200s, so rapid switching A/B inherently is a non-starter!
-It lasts ~100s after the test too
-Playing just the >20kHz HD spectrum (without the music) didn't produce the same effect
-There were still no statistically meaningful differences between the subjective ratings of the SD / HD pieces under a forced choice condition, except for 'natural':



So maybe rather than rapid A/B testing, we should have subjects listen to several minutes of audio and simply ask which sounds more natural rather than high resolution, better etc... indeed the speculation around mechanisms is basically just 'sounds more natural' hand waving.

Here are the fun EEG plots:

fpsyg-08-00093-g001.jpg



In particular, note the difference post listening in the posterier, right brain activity. Full range subjects have 1/3 of a uV more activity at ~12Hz.

fpsyg-08-00093-g002.jpg



The top plots show the Integrative effect. Note how alpha waves (arousal pleasure) and slightly less so beta waves (vigilance) pull ahead the longer the listener is exposed to HD audio. Just think of how much more productive and pleasurable my life is after listening to HD music for hours :p

On the Importance of Timing

Us Electrical Engineers tend to think of everything in the frequency domain as that is the convenient design and analysis space for electronics, but our hearing isn't a radio and we don't hear in the frequency domain directly. We perceive things transiently and must consider the biological implementation and motivation of the human ear.

If you think of our ear as a reverse headphone, the eardrum (timpanic membrane) is like the driver diaphragm and behind it the cochlea is like the DAC, transducing the physical vibration recieved into bioelectric neurological signals. The cochlea itself is a spiral containing a series of delicate cochlear hair strands that pickup the sound running past them. These hairs have very fine spacing, and because they are aligned linearly along the length of the cochlea, the brain has physical access to excellent timing data.

structures-outer-ear.jpg


The cochlea is a folded up length of sensors displaced slightly from one another, providing extremely high resolution timing data on an incident pulse.

Indeed, we can similarly find objective evidence of human hearing WAY above 20 kHz by focusing on timing differences rather than music or other 'informational' signals. A few years back I came across this nice study looking at identifiable time offsets between a pair of ribbon speakers: http://boson.physics.sc.edu/~kunchu...isalignment-of-acoustic-signals---Kunchur.pdf

In the study a range of participants (including several in their mid/late 40s!) were sat in front of a pair of aligned ribbon speakers that played a steady tone. A series of tests were done that displaced one of the ribbon speakers by a slight offset in distance from the subject:

1683653879008.png



The results are absolutely stunning:

1683653951244.png



All subjects guessed correctly 10/10 times for displacements as small as ~3mm!



Using a chi-squared approach, this puts the threshold of judgement vs. chance at around 2.3 mm, corresponding to a time delay of less than 6.7 us.

If we examine the corresponding max frequency that would be required to capture a 6.7 us period signal without aliasing:

Fsample = 2 * 1/(6.7x10^-6)=3.0 x10^5 Hz, or about 300 kHz, that's nearly an order of magnitude more than the 44kHz sampling assumption used by Redbook standard, and represents a max hearing 'frequency' of ~150kHz, not 20kHz as is commonly assumed!

It's particularly interesting for us here in IEM world, as this paper does an excellent job noting the inherent disadvantage of broad firing speakers:



The transient smearing one gets listening to speakers is far and away the bottleneck compared to human hearing, and we can't speed up the sound waves to compensate - this is an inherent physical limitation for speakers that keep them below the fidelity possible for the ear. IEMs are the ideal solution to this challenge! They and headphones are probably the only form factor with a remote chance of relaying this timing info in real world systems/environments...

So given an exceptional, close, lab grade transducer and chain, we still have room to grow even past 192 kHz sampling in terms of transient audibility... this makes a lot of sense if we think in evolutionary terms rather than audiophile:

Our senses are designed to tell us about opportunity and threat in our environment. Spatial offset information allows our brain to extrapolate positional data about the source but also speed information too (displacement vs. time). It's not hard to understand that those able to better hear where that tiger is coming from would more often survive to reproduce... the compounded evolutionary effect is our ears are spatial specialists, capable of 'super human' data extraction in this space far beyond what we would expect from pleasurable signals like music.

As noted in the intro, if we examine the human brain structure, we find it's actually 2 brains in bunk beads. The lower 'lizard' brain is primordial and instinctive, and handles our basic regulatory and survival functions. It makes sense that all sensory data is first run by the lizard. If you a Tiger is about to pounce on you, the luxury of time it takes to get the neocortex involved and in approval will mean you're cat lunch... It makes total sense then that it would be far more effective at evaluating timing info then we are consciously aware with our neocortex.

Conclusion

It's ironic given the historical dismissal, that the perhaps the best arguments for HD audio benefits come not from subjectivists, but rather objective studies. This is a natural product of the fact that for musical info, most if not all of the observable benefit is subconscious. If we look to nature for an explanation, we find a very tidy biological explanation grounded in evolutionary theory, and in particular the ear's ability to perceive incredibly minute timing differences.

The end result? You're not crazy for paying extra for Qobuz, and you're definitely not crazy if you prefer IEMs and headphones to 2 channel and surround!

Sincerely, what’s happened in the field in the last six years-plus since the last study was published?
 
May 15, 2023 at 10:59 PM Post #15 of 57
I see we've moved on to attacking the source!
Well if the sources have no definite conclusions, why are you still treating them as the second coming? Such studies have been out that people might feel something with ultra-sonic frequencies (usually unease). From an anatomical standpoint, the range of our hearing is defined by what connections we still have with the cilia in our cochleas. For an adult, they're going to have a range that goes below 20kHz.
 
Last edited:

Users who are viewing this thread

Back
Top