The Subconscious Case for HD Audio
May 19, 2023 at 5:35 AM Post #46 of 57
There’s two speakers. Transducers don’t all sound exactly alike, even when they're the same model each fed the same mono signal. Pair matching is absolutely a potential factor, but probably not an important one either way, as they displaced both sides separately.

Discerning 0.2 dB amplitude differences over <10 us periods sure sounds a lot like we might want to retain signal information there too.
The 0.2dB (very likely more!) amplitude difference is present for the whole 10 seconds of the test signal not just for 10 microseconds, I don't know where you got that 10us. Noone would hear such a small volume difference over such a short period.

The paper specifically points out that the signals were not perfectly level matched. What it does not point out is that a 0.2dB difference in volume could be heard and a 0.5dB difference in volume is easily heard. The listeners could easily pick up on that instead of ultrasonics or "time misalignment". Although the signal attenuation is caused by the misalignment so in a roundabout the listeners picked up on misalignment but most likely due to improper volume matching. Think of it as testing two amplifiers without matching the output level. Sure, they do sound different but there's not much you can conclude from that until you redo the test with proper volume matching.
Ultrasonics are not being properly sampled and are (hopefully) beyond the bandlimit of the lowpass filter used to make redbook recordings. The corresponding transient signals are thus not recoverable on CD quality audio, regardless of bit depth.
A transient that contains frequencies over ~20kHz will not perfectly recoverable, I agree. The frequencies above that will be distorted or completely lost. This is due to the limited bandwidth though, not due to the limited timing accuracy.
I agree a higher sampling rate will allow for higher bandwidth. It won't likely improve timing accuracy because redbook's timing accuracy is mostly limited around the low frequencies as already shown, and a higher sampling rate will not help with that.

And yes, time discrimination is absolutely evidence of the equivalent frequency. That’s not ‘my idea’ it comes from a nice old dead guy named Fourier. Perhaps our struggle here is in understanding that monochromatic phase is a time delay?
This was already pointed out to you but it seem to go over your head. Fourier did not discover the relationship between frequency and period time, the Fourier-transform is not this, and the idea to transform a signal from the time domain to the frequency domain by using f=1/T is definitely your (very stupid) idea, not Fourier's.
 
Last edited:
May 19, 2023 at 8:41 AM Post #47 of 57
I agree a higher sampling rate will allow for higher bandwidth. It won't likely improve timing accuracy because redbook's timing accuracy is mostly limited around the low frequencies as already shown, and a higher sampling rate will not help with that.

Temporal resolution/accuracy is a great example of how unintuitive digital audio can be and why people who haven't gotten a deep understanding of it can have massively wrong assumptions while being self-confident about it. It just FEELS so self-evident that of course doubling the sampling rate gives twice the temporal resolution, but what you know, temporal resolution in digital audio is only dictated by the signal frequency and bit depth! Not to mention, that there isn't a need to improve temporal resolution of CD quality digital audio anyway, because it already is much better than needed (for human ears that is).

Fourier did not discover the relationship between frequency and period time, the Fourier-transform is not this, and the idea to transform a signal from the time domain to the frequency domain by using f=1/T is definitely your (very stupid) idea, not Fourier's.
To be fair, Fourier transformation isn't far from the f = 1/T idea, especially ∆f *∆T = 1. However, since we are talking about resolutions in time and frequency domains, one must be careful about how to use and interpret that relationship.
 
May 19, 2023 at 9:49 AM Post #48 of 57
To be fair, Fourier transformation isn't far from the f = 1/T idea, especially ∆f *∆T = 1. However, since we are talking about resolutions in time and frequency domains, one must be careful about how to use and interpret that relationship.
I don't want to claim that they are unrelated, I'm saying this wasn't Fourier's ground breaking discovery. If you read the context of this thread, the OP does not seem to recognize the difference and is about as far from being careful about how to interpret it as he can be.
 
May 19, 2023 at 1:50 PM Post #49 of 57
From the get go I acknowledged we can’t consciously hear those frequencies (hence the title and opening preamble!). The output data from the 2 speaker testing is clearly conscious and thereby in contrast to that…

What I did suggest is that the (quite incredible!) time differentiation humans show on displaced audible tones may be evidence of some of that subconscious ability. Tying them to their corresponding frequency spectrum isn’t dishonest or misleading in that very clear context. I guess I can understand the appearance of math as being some assumed proof, but I’m literally just pointing out the corresponding spectrum to those timings… it’s pretty amazing that humans can do that so precisely, and I definitely think it’s happening subconsciously (I mean who could do that kind of trigonometry in their head :p).

We clearly detect amplitude variation to some level on that time scale from the displaced tone experiments, it really doesn’t seem like a stretch to me to acknowledge we can use existing audio tech to actually include the entirety of that sound information for our subconscious and body to react however it normally would to natural sound that has not been band limited. The technology exists to make it a moot point… why not?
Yes you did acknowledge no audible result in listening tests, but you also said in the first post
Indeed, we can similarly find objective evidence of human hearing WAY above 20 kHz by focusing on timing differences rather than music or other 'informational' signals. A few years back I came across this nice study looking at identifiable time offsets between a pair of ribbon speakers: http://boson.physics.sc.edu/~kunchu...isalignment-of-acoustic-signals---Kunchur.pdf
Which is not what the paper say (bolding is mine). The timing difference is the difference between 2 sound sources interfering with each other. What is perceived is the consequence of that interference(comb effect of sort). It's different from us actually perceiving signals or absence of signal within that time frame. You have no right to take that timing and just turn it into the meaning that the listeners can perceive way above 20kHz. The paper does not say so, and neither does the experiment.
Again I gave the ITD example because it's also a situation where certain conditions allow picking up extremely small time differences you cannot turn into the equivalent frequency we can hear. The hearing occurs even if no frequency of 1/T exists in the signal or its FFT.

The speaker paper used 2 sound sources, the ITD tests rely on 2 ears, and it's the displacement between those 2 sources or "mics" that gives us something extra that we can identify. Interference for the speakers, and the brain being able to recognize the pattern in each ear having similar envelope or whatever tricks it actually uses, and then simply noticing which ear got is first and deducing the incoming direction from that delay and a lot of experience with visual confirmation.
You are mistaken when you take some micro sec value resulting in perception in those tests and turn it into the equivalent frequency we can hear, conscious or not it does not matter, the logic is wrong.


Using a chi-squared approach, this puts the threshold of judgement vs. chance at around 2.3 mm, corresponding to a time delay of less than 6.7 us.

If we examine the corresponding max frequency that would be required to capture a 6.7 us period signal without aliasing:

Fsample = 2 * 1/(6.7x10^-6)=3.0 x10^5 Hz, or about 300 kHz, that's nearly an order of magnitude more than the 44kHz sampling assumption used by Redbook standard, and represents a max hearing 'frequency' of ~150kHz, not 20kHz as is commonly assumed!
Similar issue here.


It's not like we're trying to deny relations between certain notions of speed and frequency content, most and probably all of those replying to you happen to have the basics down about signal transmission(I might be the most ignorant on that because I often don't even remember enough math to get where I need). If we're discussing digital signal, we understand that band limiting/low passing the system will limit how fast the transmission can go. And even for analog stuff, if I want my amplitude to rise to X volt in a given time(let's say transient stuff), I will have a signal that shows high enough frequency on the FFT, or the signal simply won't rise fast enough.
We get that, but only apply it to where it applies. If you want a more accurate transient, you will need higher sample rate as a way to not low pass the highest freqs "within" that transient signal. But when you move one speaker, it another story where T amount of out of phase does not create a signal at your ear at 1/T frequency. That false logic has got to go.
It's already strange enough that you'd use conscious test results to determine the highest unconscious hearing frequency.

I don't really know how to better explain myself. All my posts so far have trying to say the same thing. I don't mind people using hires, I don't mind the desire for the more accurate transient stuff. Whether we can perceive it or not(pretty much all points to not), doesn't mean someone can't still want it done better. I'm personally not more involved with those choices than I am with what music you're going to play.
 
May 19, 2023 at 2:35 PM Post #50 of 57
Inaudible is inaudible. No amount of mental calisthenics is going to change that.

The ears define what we can hear. The brain just interprets what the ears hear. If the ears don't hear it, the brain doesn't interpret it. We don't hear with the fluid around our brains, our gall bladders or our feet. We hear with our ears. Looking for hearing in places other than the ears is pointless.

16/44.1 was accepted as a standard because it contains everything a human ear can hear. For listening to music, it's overkill. Most people would be just as happy with 12/24. The fact that 16/44.1 is able to reproduce everything we can hear should make audiophiles happy. I can tell you that this hifi nut who grew up with LPs, cassettes and R2R tape is plenty happy with it.

Why do people worry about sound they can't possibly hear? OCD is a terrible thing. Hopefully, they'll find a cure someday.
 
May 19, 2023 at 4:29 PM Post #51 of 57
however it's important to note that double blind testing only accounts for discernible, conscious differences.
No it doesn’t. In an ABX test for example we listen to A, then to X; Do you hear a difference? Do you feel a difference? Do you have any sense of anything being different? If the response to any of these questions is “yes”, then the answer is B but if all the responses are “no”, the answer is A. Have you not done an ABX test?
Using a chi-squared approach, this puts the threshold of judgement vs. chance at around 2.3 mm, corresponding to a time delay of less than 6.7 us.

If we examine the corresponding max frequency that would be required to capture a 6.7 us period signal without aliasing:
What has a 6.7us wave period got to do with anything? A difference in timing (delay/phase) of 6.7us is detectable, not an audio freq with a 6.7us wave period. And, what has a delay/phase/timing difference of 6.7us got to do with SD or HD?
It's ironic given the historical dismissal, that the perhaps the best arguments for HD audio benefits come not from subjectivists, but rather objective studies.
What objective studies? While there are a few studies which indicate that under certain conditions ultrasonic freqs can affect brainwave patterns, there’s no reliable evidence this has any perceivable effect, either consciously or subconsciously on mood or feelings. When ultrasonic freqs have been sensed, almost universally the experience is described as unpleasant or uncomfortable, the opposite of what you claimed! There are no objective arguments for HD, let alone “the best arguments”!
What’s happened in the field in the last six years-plus since the last study was published?
Actually there’s been several papers published in the last few years. This paper published in Nature in 2020 is the most relevant as it directly addresses the high freq content of hi-res audio and specifically address some of the sources quoted. It’s worth a read!
I see we've moved on to attacking the source!
Firstly, unfortunately that appears to be the issue, what “you see”. You seem to have misinterpreted, over interpreted or misrepresented what you’ve seen.

Secondly, what do you mean “we’ve moved on to attacking the source”, we don’t move on to that, questioning the source is the first step in science! Even more so if the source contradicts existing evidence and even more again if those sources have already been discredited! Is it purely coincidence that two of your sources are arguably the most infamously discredited in the audio world or is it just an another audiophile agenda? … Mmmm

G
 
Last edited:
May 19, 2023 at 4:44 PM Post #52 of 57
Actually there’s been several papers published in the last few years. This paper published in Nature in 2020 is the most relevant as it directly addresses the high freq content of hi-res audio and specifically address some of the sources quoted. It’s worth a read!
That's a really cool summary paper.
 
May 19, 2023 at 5:24 PM Post #53 of 57
Hello brave Head Fi Scientists! I would like to acknowledge the consistent efforts from the core group in this particular sub forum in pushing for objective standards here and particularly excellent discussion on the virtue of double blind tests for audio.

Double blind testing is a critical tool in evaluating subjective claims like we see thrown around in audiophillia constantly. Being able to take such a test and evaluate your own listening objectively is a very convincing experience, however it's important to note that double blind testing only accounts for discernible, conscious differences. However there are undeniably a variety of phenomena that affect the human body in consistent, objective ways that are not perceptible to us.

For instance, if you were to run a (necessarily quick) double blind test on subjects to see if they could tell the difference between oxygen and carbon monoxide, the subjects would not be aware of any difference due to the lack of odor, and yet the carbon monoxide would kill them after a short while. It's an extreme scenario, but hopefully you get the point.

Why does it matter? Because our senses are routed first through the lower 'survival' brain for critical evaluations like fight or flight before the conscious upper brain is even made aware of the detection. In evaluating audio capabilities of the human hearing system then, are we artificially constraining results by focusing only on conscious perception?

I would like to invite your consideration...

The Subconscious Case for HD Audio

You hear a lot about cables, amps and DACs having subtle 'unmeasurable' effects on sound in the forum proper and other similar subjective audiophile communities. Less popular (at least these days) are discussions around lossy vs. lossless and even rarer folks claiming benefits from so called 'HD audio' (for the purposes of this discussion, I'll take this meaning as >48 kHz sampling, not going to discuss quantization and 24 bit at all).

Part of this dismissal for the case of HD Audio and even lossy vs. lossless CD (Redbook) quality stems from the fact that it is fairly easy to do online double blind tests that toggle seamlessly back and forth between qualities and even offer 'tests' to gauge your ability to ability to identify them correctly. These tests are exhausting but highly convincing... For instance while I can discern lossy vs. lossless most of the time (slightly over 75% across tests), I can't statistically tell the difference between SD and HD audio myself. Indeed HD audio has consistently failed double blind experiments, whether it's SACD, HDCD, DSD, MQA (lol), or even 192/24 bit FLAC. I think this removes a lot of the speculative room in the hobby for people to claim audible improvements... it's much harder to properly double blind test things like cables or sources that require (blinded) helpers and precise level matching etc.

So that is all tidy and nice for an objectivist leaning listener such as myself, right? Wrong. There exists objective data that not only humans not only hear the difference in music sampled above 44 kHz, and that we enjoy it more too! We just perceive these differences subconsciously, making listening tests largely invalid. If we instead look at electroencephalogram (EEG) data of listeners exposed to SD and HD music while engaged in an unrelated intellectual task, there are observable differences in the human neurological response:



https://www.frontiersin.org/article...attentional state without conscious awareness.

These isn't the first study on HD effects either, it builds on prior work done. A series of studies have shown measurable differences in the brain response when HD content is present in music: Oohashi et al., 2000, 2006; Yagi et al., 2003a; Fukushima et al., 2014; Kuribayashi et al., 2014; Ito et al., 2016. It's expected to see increased alpha brainwaves (arousal!) when there are 'inaudible' HD frequency components.

Interesting take aways from this more recent study:

-The effect takes a while to kick in, ~200s, so rapid switching A/B inherently is a non-starter!
-It lasts ~100s after the test too
-Playing just the >20kHz HD spectrum (without the music) didn't produce the same effect
-There were still no statistically meaningful differences between the subjective ratings of the SD / HD pieces under a forced choice condition, except for 'natural':



So maybe rather than rapid A/B testing, we should have subjects listen to several minutes of audio and simply ask which sounds more natural rather than high resolution, better etc... indeed the speculation around mechanisms is basically just 'sounds more natural' hand waving.

Here are the fun EEG plots:

fpsyg-08-00093-g001.jpg



In particular, note the difference post listening in the posterier, right brain activity. Full range subjects have 1/3 of a uV more activity at ~12Hz.

fpsyg-08-00093-g002.jpg



The top plots show the Integrative effect. Note how alpha waves (arousal pleasure) and slightly less so beta waves (vigilance) pull ahead the longer the listener is exposed to HD audio. Just think of how much more productive and pleasurable my life is after listening to HD music for hours :p

On the Importance of Timing

Us Electrical Engineers tend to think of everything in the frequency domain as that is the convenient design and analysis space for electronics, but our hearing isn't a radio and we don't hear in the frequency domain directly. We perceive things transiently and must consider the biological implementation and motivation of the human ear.

If you think of our ear as a reverse headphone, the eardrum (timpanic membrane) is like the driver diaphragm and behind it the cochlea is like the DAC, transducing the physical vibration recieved into bioelectric neurological signals. The cochlea itself is a spiral containing a series of delicate cochlear hair strands that pickup the sound running past them. These hairs have very fine spacing, and because they are aligned linearly along the length of the cochlea, the brain has physical access to excellent timing data.

structures-outer-ear.jpg


The cochlea is a folded up length of sensors displaced slightly from one another, providing extremely high resolution timing data on an incident pulse.

Indeed, we can similarly find objective evidence of human hearing WAY above 20 kHz by focusing on timing differences rather than music or other 'informational' signals. A few years back I came across this nice study looking at identifiable time offsets between a pair of ribbon speakers: http://boson.physics.sc.edu/~kunchu...isalignment-of-acoustic-signals---Kunchur.pdf

In the study a range of participants (including several in their mid/late 40s!) were sat in front of a pair of aligned ribbon speakers that played a steady tone. A series of tests were done that displaced one of the ribbon speakers by a slight offset in distance from the subject:

1683653879008.png



The results are absolutely stunning:

1683653951244.png



All subjects guessed correctly 10/10 times for displacements as small as ~3mm!



Using a chi-squared approach, this puts the threshold of judgement vs. chance at around 2.3 mm, corresponding to a time delay of less than 6.7 us.

If we examine the corresponding max frequency that would be required to capture a 6.7 us period signal without aliasing:

Fsample = 2 * 1/(6.7x10^-6)=3.0 x10^5 Hz, or about 300 kHz, that's nearly an order of magnitude more than the 44kHz sampling assumption used by Redbook standard, and represents a max hearing 'frequency' of ~150kHz, not 20kHz as is commonly assumed!

It's particularly interesting for us here in IEM world, as this paper does an excellent job noting the inherent disadvantage of broad firing speakers:



The transient smearing one gets listening to speakers is far and away the bottleneck compared to human hearing, and we can't speed up the sound waves to compensate - this is an inherent physical limitation for speakers that keep them below the fidelity possible for the ear. IEMs are the ideal solution to this challenge! They and headphones are probably the only form factor with a remote chance of relaying this timing info in real world systems/environments...

So given an exceptional, close, lab grade transducer and chain, we still have room to grow even past 192 kHz sampling in terms of transient audibility... this makes a lot of sense if we think in evolutionary terms rather than audiophile:

Our senses are designed to tell us about opportunity and threat in our environment. Spatial offset information allows our brain to extrapolate positional data about the source but also speed information too (displacement vs. time). It's not hard to understand that those able to better hear where that tiger is coming from would more often survive to reproduce... the compounded evolutionary effect is our ears are spatial specialists, capable of 'super human' data extraction in this space far beyond what we would expect from pleasurable signals like music.

As noted in the intro, if we examine the human brain structure, we find it's actually 2 brains in bunk beads. The lower 'lizard' brain is primordial and instinctive, and handles our basic regulatory and survival functions. It makes sense that all sensory data is first run by the lizard. If you a Tiger is about to pounce on you, the luxury of time it takes to get the neocortex involved and in approval will mean you're cat lunch... It makes total sense then that it would be far more effective at evaluating timing info then we are consciously aware with our neocortex.

Conclusion

It's ironic given the historical dismissal, that the perhaps the best arguments for HD audio benefits come not from subjectivists, but rather objective studies. This is a natural product of the fact that for musical info, most if not all of the observable benefit is subconscious. If we look to nature for an explanation, we find a very tidy biological explanation grounded in evolutionary theory, and in particular the ear's ability to perceive incredibly minute timing differences.

The end result? You're not crazy for paying extra for Qobuz, and you're definitely not crazy if you prefer IEMs and headphones to 2 channel and surround!
great stuff. thanks for posting it and i'm sure you know to ignore all the attacks from cultists who insist with regard to digital audio science ended in 1979.
 
May 19, 2023 at 5:41 PM Post #54 of 57
That's a really cool summary paper.
Yep and what they tested and how they tested was impressive as well. It very diplomatically rebuts Oohashi.
great stuff.
How is unsupported speculation “great stuff”? Maybe to an ignorant audiophile used to believing marketing BS it would look like “great stuff” though?
thanks for posting it and i'm sure you know to ignore all the attacks from cultists who insist with regard to digital audio science ended in 1979.
You are obviously right again, because the science I quoted was from 2020 which is of course before 1979. Well done and thanks for yet more of your usual utter nonsense!

G
 
May 21, 2023 at 8:12 PM Post #55 of 57
The paper specifically points out that the signals were not perfectly level matched. What it does not point out is that a 0.2dB difference in volume could be heard and a 0.5dB difference in volume is easily heard.
It actually, wrongly, states the opposite. It assumes JND to be 0.7 dB and based on that concludes that it had to be ultrasonics that made the discrimination possible.
The listeners could easily pick up on that instead of ultrasonics or "time misalignment".
They say:
The control sound (d = 0) was perceived to have a sharper or brighter timbre than the displaced setting (d ≠ 0)
Yeah, guess how 7 kHz that is 0.2, 0.3 dB louder will sound :)
 
Jul 11, 2023 at 4:11 PM Post #56 of 57
Yep and what they tested and how they tested was impressive as well. It very diplomatically rebuts Oohashi.

How is unsupported speculation “great stuff”? Maybe to an ignorant audiophile used to believing marketing BS it would look like “great stuff” though?

You are obviously right again, because the science I quoted was from 2020 which is of course before 1979. Well done and thanks for yet more of your usual utter nonsense!

G

"Strictly speaking, the present study did not address the discrimination between high-resolution audio and standard audio, because the equipment used in this study was high-resolution grade and both sound materials were created in a high-resolution format (192 kHz/24 bits), except that one of them did not contain high-frequency components. Moreover, the current finding does not deny the existence of audiophiles with the ability to discriminate between the original sound and filtered and blurred sound without high-frequency components."

The discrepancy was not addressed in the study.
 
Last edited:
Jul 12, 2023 at 4:18 AM Post #57 of 57
"Strictly speaking, the present study did not address the discrimination between high-resolution audio and standard audio, because the equipment used in this study was high-resolution grade and both sound materials were created in a high-resolution format (192 kHz/24 bits), except that one of them did not contain high-frequency components. Moreover, the current finding does not deny the existence of audiophiles with the ability to discriminate between the original sound and filtered and blurred sound without high-frequency components."

The discrepancy was not addressed in the study.
Clearly the “discrepancy” was addressed, although as you quote, “strictly speaking” this study on it’s own does not comprehensively disprove audiophile claims of discriminating hi-res because it was not directly testing that, it was only testing one aspect of the claims.

The issue with hi-res has always been that although there are obvious differences, they should all be outside the thresholds of audibility. Relative to hi-res, standard 44/16 has a higher noise floor/dither, lower timing resolution, higher jitter, the use of a steep anti-alias/image filter and of course, no content >22kHz. In addition, audiophiles typically make all sorts of other audible claims about hi-res; greater detail, frequency response improvements/colouration, etc. The quoted study did not test for any of these typical additional claims but they didn’t need to, as these have already been proven to not even exist, by the Sampling Theorem itself and confirmed by relatively simple objective measurements/tests. Nor did the study address some of the real differences, such as jitter, timing resolution or noise floor/dither, which again have already been addressed by other studies and, there was no direct comparison between standard and hi-res. The study only tested one of the real and common claims, that content above 22kHz removed with a standard sinc function type filter can be discriminated.

G
 

Users who are viewing this thread

Back
Top