Hi-Res Audio, DSD and placebo effect??
Sep 20, 2017 at 4:47 PM Post #46 of 121
An engineer, a mathematician, and a physicist are each presented with a beautiful woman and the stipulation that at each time interval, they may move half of the remaining distance towards her.

The mathematician points out that the distance will never reach zero, and walks away in disgust.

The physicist opines that if each iteration requires a finite amount of energy then the energy expended in the approach will be inversely proportional to the distance remaining and gives up on the spot.

The engineer says "8 feet, 4 feet, 2 feet, 1 foot, 6 inches....close enough for practical purposes".

... and the members of sound science stand there endlessly debating about whether her garter belt is overly complex, or if the memory foam mattress she's laying on was originally designed with such behavior in mind.
 
Sep 22, 2017 at 2:40 AM Post #47 of 121
[1] This is simply not true, at least in theory. All impulses (specifically - electronically generated ones) have, in theory, an infinitely short transient, and therefore, an infinite series of ultrasonic partials. [1a] So pretty much any electronic music could, in theory, have such transients. [2] Even a real-world (hard) cymbal strike might be expected to have a transient fast/sharp enough to require ultrasonics to reproduce fully. Now, is any of that really audible, or what, I don't know.

1. You are confusing theory with practice. Yes, in theory certain generated pulses and certain longer duration signals (such as square waves) would contain an infinite series of partials/harmonics but in practise of course that's impossible! Think about it for a moment, it would take an infinite amount of time just to calculate the frequencies of those partials, let alone actually generate them. Sending an instruction to a synth and then waiting until beyond the end of the universe for it to play something is too much latency by anyone's standards!
1a. So no, no electronic music could contain such transients. Or rather, in theory it could but in practise it couldn't!

2. Something like a cymbal typically contains very significant ultrasonic content and not just transient in nature. However, this statement is misleading in practise and therefore in practise questions about audibility (HF hearing response and masking for example) are of little/no concern. In practise, as the distance from the cymbal increases the loss of very high/ultrasonic freqs increases (relative to lower freqs), due to air absorption and room surfaces not reflecting ultrasonics. Although we can measure very significant ultrasonic content if we measure the sound produced by a cymbal from say an inch away, if we measure from a more typical audience listening distance, in a typical acoustic environment, that "very significant ultrasonic content" becomes insignificant and therefore questions of ultrasonic hearing abilities are moot! The same is applicable to just about all other acoustic instruments quoted as having significant ultrasonic content, in practise, in typical listening situations, they have extremely little or none.

G
 
Sep 22, 2017 at 6:32 AM Post #48 of 121
About square waves:

In real life all square waves are band limited. The correct way to create digital square waves is to sum the fundamental frequency and the harmonics up the nyquist frequency. The result doesn't look much like a square wave especially when the fundamental frequency is close to the nyquist frequency, but it sounds right. If you generate "boxy" on-off square waves, they look right, but sound wrong unless the nyquist frequency happens to be a multiple of the fundamental frequency (e.g. 441 Hz square wave at 44100 Hz samplerate).

Why?

Because the ON and OFF time points of a square wave happen theoretically somewhere between the sample point and only sometimes at those time points. So there's jitter-like timing errors => spectral spreading. Band-limited signals have slow enough raise time and that's why you can sample them at sample points. They have a valid value at those points that tell exactly what the signal is doing between sample points. That's one thing people misunderstand about digital audio. People often think the signal is know only at sample points, but band-limited signals can have only one "route" between sample points and that's why the original signal is known everywhere, also between sample points.

Every time you see something infinitely sharp/edgy related to digital audio it's theory/fantasy. Real life is "round", be it digital or analog.
 
Sep 22, 2017 at 7:09 AM Post #49 of 121
Every time you see something infinitely sharp/edgy related to digital audio it's theory/fantasy. Real life is "round", be it digital or analog.

Yep, that was exactly my point and why no actual music, even electronic music, could "have such transients".

The engineer says "8 feet, 4 feet, 2 feet, 1 foot, 6 inches....close enough for practical purposes".

That depends on the type of engineer though. A pro sound engineer would say 3 inches, because most consumers do not have a c*ck significantly longer than 6 inches! :)

G
 
Sep 28, 2017 at 5:06 PM Post #50 of 121
1. You are confusing theory with practice. Yes, in theory certain generated pulses and certain longer duration signals (such as square waves) would contain an infinite series of partials/harmonics but in practise of course that's impossible! Think about it for a moment, it would take an infinite amount of time just to calculate the frequencies of those partials, let alone actually generate them. Sending an instruction to a synth and then waiting until beyond the end of the universe for it to play something is too much latency by anyone's standards!
1a. So no, no electronic music could contain such transients. Or rather, in theory it could but in practise it couldn't!

2. Something like a cymbal typically contains very significant ultrasonic content and not just transient in nature. However, this statement is misleading in practise and therefore in practise questions about audibility (HF hearing response and masking for example) are of little/no concern. In practise, as the distance from the cymbal increases the loss of very high/ultrasonic freqs increases (relative to lower freqs), due to air absorption and room surfaces not reflecting ultrasonics. G

Regarding point 1) You can't represent infinite harmonics in the frequency domain, however there's no problem with doing so in a PCM format. Of course, no DAC can, or should, produce infinite harmonics or infinitely high frequencies, but that doesn't change what's represented in the data.

I'm well aware that perfect impulse responses do not exist anywhere outside of a computer.

It bears mentioning that most digital synthesizers do produce waveforms with sections of infinite/undefined slope ... pretty much all digital synths will output a perfect square wave, saw wave, etc. Again, I realize it doesn't go any further than the internal mixer of the DAW, but it's still represented accurately and without breaking anything.

Regarding point 2) Sure, that's true enough.

Really, I'm not disputing any of the discussion of whether real recordings contain any such information, or whether it's worth reproducing, etc - my only real point is that 44.1 is arguably not a high enough sampling rate to represent all audible information. There is (limited, but not totally incredible) evidence that people can hear events that occur more quickly than the duration of 1 sample at that sampling frequency. Whether transducing this information is practical or not, or worthwhile, is another issue - I don't disagree that it's a totally impractical consideration as it stands with today's loudspeaker/recording technology. I guess it is a problem left for acoustic engineers in the year 2117... :)
 
Last edited:
Sep 28, 2017 at 5:33 PM Post #51 of 121
my only real point is that 44.1 is arguably not a high enough sampling rate to represent all audible information. There is (limited, but not totally incredible) evidence that people can hear events that occur more quickly than the duration of 1 sample at that sampling frequency.

I hear that argument a lot from audiophiles, but it isn't true. There is nothing audible above the range of human hearing. And people cannot hear time intervals that occur more quickly than a sample. The only evidence that it can be perceived at all is that you can see reactions in brain waves. That isn't the same as hearing. There's plenty of evidence that ultrasonic frequencies add absolutely nothing to music that can be heard or appreciated as sound quality. In fact, I saw a study once that presented people with two samples- one was a full frequency response recording of music, and the other was the same music with all of the frequencies above 10kHz rolled off. Although some people could hear the difference between the two, the majority of the people tested said that one recording didn't have any better sound quality than the other.

You have to keep things in perspective. Humans can hear about 9 octaves of sound. The difference between 44.1 and 96kHz is only one more octave, and it's an inaudible octave at that. For the purposes of listening to music, it's completely useless. There are other octaves, particularly the 6th and 7th octave, that matter a LOT more. Those are the ones to focus on if you want to hear an audible improvement in sound quality.
 
Sep 28, 2017 at 6:19 PM Post #52 of 121
I hear that argument a lot from audiophiles, but it isn't true. There is nothing audible above the range of human hearing. And people cannot hear time intervals that occur more quickly than a sample. The only evidence that it can be perceived at all is that you can see reactions in brain waves. That isn't the same as hearing.

I'm perfectly willing to concede that humans can't hear ultrasonic frequencies, heck, that's why they're called "ultrasonic". In fact I don't really need to concede that point, I never thought or argued otherwise - everyone posting in the science forum recognizes this, I hope.

However, I am not really convinced that nobody can hear time intervals shorter than can be represented at 44.1. A sample at 44.1 is roughly 23 microseconds long. Experiments suggest people can detect timing differences around 10 microseconds. http://boson.physics.sc.edu/~kunchur//temporal.pdf

To wit:

In binaural localization by interaural time difference, it is well known that differences in arrival times of order 10 μs are distinguishable (Henning, 1974; Nordmark, 1976). Monoaural experiments involving iterated ripple noise (IRN) and inter-pulse gaps have shown similar thresholds in temporal resolution (Krumbholz, 2003; Leshowitz, 1971)

The Krumbholz paper concludes that delays introduced into monaural signals, of only "a few tens of microseconds" are detectable, and listeners perceive differences as small as ~10 microseconds. So that leaves me wondering if 44.1 is truly enough for that. It could be that I'm misinterpreting these results?
 
Last edited:
Sep 28, 2017 at 8:05 PM Post #53 of 121
A sample at 44.1 is roughly 23 microseconds long. Experiments suggest people can detect timing differences around 10 microseconds. http://boson.physics.sc.edu/~kunchur//temporal.pdf

Well you can guess where I'm going to go with this... Do you think you could hear a 10 microsecond gap in the middle of a Beethoven symphony? What part of recorded music would benefit from having timing faster than 1/44.000th of a second? There's no musical instrument that can create an attack anywhere close to that, and most music has some sort of reverb or room ambience added to it that would smear over anything that fast.

This is exactly the kind of theoretical rabbit hole that sends audiophiles off on wild goose chases. They worry about things that are so small and so removed from the purposes they use their stereo systems for, it's not even relevant. It's like square waves. Yes, they are very difficult to reproduce accurately. But who cares? Because they aren't going to turn up in recorded music. There are so many things that DO matter. Why waste your time worrying about one grain of sand on the beach and ignore the pounding waves of the ocean?

Just because it can be perceived in some way- whether brain waves or in test tones or clicks- it doesn't mean that it can be perceived in recorded music. As long as you listen to music, you can safely ignore all that stuff. It won't make a lick of difference to how good your music sounds. If you want to make music sound better, focus on achieving a balanced frequency response and buy well recorded and mastered CDs. That will make a lot more of an improvement than being able to hear 20 microsecond clicks.
 
Sep 28, 2017 at 8:36 PM Post #54 of 121
Well you can guess where I'm going to go with this... Do you think you could hear a 10 microsecond gap in the middle of a Beethoven symphony? What part of recorded music would benefit from having timing faster than 1/44.000th of a second? There's no musical instrument that can create an attack anywhere close to that, and most music has some sort of reverb or room ambience added to it that would smear over anything that fast.

This is exactly the kind of theoretical rabbit hole that sends audiophiles off on wild goose chases. They worry about things that are so small and so removed from the purposes they use their stereo systems for, it's not even relevant. It's like square waves. Yes, they are very difficult to reproduce accurately. But who cares? Because they aren't going to turn up in recorded music. There are so many things that DO matter. Why waste your time worrying about one grain of sand on the beach and ignore the pounding waves of the ocean?

Just because it can be perceived in some way- whether brain waves or in test tones or clicks- it doesn't mean that it can be perceived in recorded music. As long as you listen to music, you can safely ignore all that stuff. It won't make a lick of difference to how good your music sounds. If you want to make music sound better, focus on achieving a balanced frequency response and buy well recorded and mastered CDs. That will make a lot more of an improvement than being able to hear 20 microsecond clicks.

Well, I basically agree with you, it's not something I actually worry about, nor am I going to start seeking out 96Khz recordings and telling myself I hear a difference. I know this has next to no relevance to any equipment I'm likely to own in this lifetime.

Still, I got somewhat fixated on this idea because of the commonly repeated idea that 44.1 can contain literally all audible information. So I wanted to know - do we truly mean literally, or just for all practical / realistic purposes? You might call it mere semantics, but I think this is interesting because investigating the edge cases can sometimes lead to ways to improve real-world performance. While there is no reason (that I know of) to worry about the fact that sub-23-microsecond events can be lost in 44.1, understanding the theoretical limits of perception seems pertinent to advancing the state of the art in audio reproduction. So, that's my real motivation. Now, no need to point out that I am neither a scientist nor engineer, so I'm not going to personally advance the state of the art of anything. But I still think it's interesting.

I might point out however that nearly-raw square and saw waves are not uncommon in electronic music. One of my favorite examples, a track that is also actually quite useful for revealing problems with lossy compression or weak transient response: https://www.beatport.com/track/the-rub-off-original-mix/654977 ... it relies a lot on quick transients for the actual lead instrument, and so sounds like utter trash at low bitrates... before you say anything, I don't think having a 96Khz version would help :)
 
Sep 28, 2017 at 10:05 PM Post #55 of 121
I was speaking with an old time sound guy once and he told me a great story. He said that he was designing and building a recording studio and was pouring everything he had into it to make it absolutely perfect. It was hard work and it was paying off, so he decided to take a day off and take his kid out. They went to the county fair. He was in the "fun zone" standing at the top of the "chute"... the long alleyway with games and carnival rides on both sides. The sun was setting and he closed his eyes to soak it in. He heard crowds talking to each other, passing by him on both sides. He heard the carnival barkers shilling for the games all the way down the chute. He heard pennies being tossed into glass cups and darts popping balloons. In the distance was the calliope of a merry go round. Whenever a breeze blew through, the sound changed a little. The air around him was alive with sound in all directions. He suddenly realized that even with an unlimited budget for recording and playback equipment, he could never reproduce the sound that he was experiencing in that moment. He said that moment put it all in perspective for him.

16/44.1 can't capture everything that can be heard, felt and experienced. No medium of capture, whether it's photography, audio, video or film can do that. You can add a bunch of channels and it will sound a little more lifelike. You can run two cameras and shoot in 3D and it will add that sort of dimensionality. You can put on a virtual reality helmet and add a little more. But it will never be reality. And recordings aren't really intended to be like reality. A good recording creates its own reality, within the strengths and limitations of the medium. Everything has strengths and limitations. A good sound engineer or cinematographer or game designer knows how to create a hyper-reality that suits the medium. I have acoustic 78s from 1909 that were recorded without electricity that make the hairs on the back of your head stand up because of the startling presence of the sound. It's a million miles away from high fidelity, but within its parameters, it can score a goal.

Does 16/44.1 capture reality? No. Does it capture everything that recorded music can reproduce? Yes, it does that perfectly. When you focus on tiny details and ignore the overall, it's like not seeing the forest for the trees.
 
Last edited:
Sep 28, 2017 at 10:36 PM Post #56 of 121
Well, I basically agree with you, it's not something I actually worry about, nor am I going to start seeking out 96Khz recordings and telling myself I hear a difference. I know this has next to no relevance to any equipment I'm likely to own in this lifetime.

Still, I got somewhat fixated on this idea because of the commonly repeated idea that 44.1 can contain literally all audible information. So I wanted to know - do we truly mean literally, or just for all practical / realistic purposes? You might call it mere semantics, but I think this is interesting because investigating the edge cases can sometimes lead to ways to improve real-world performance. While there is no reason (that I know of) to worry about the fact that sub-23-microsecond events can be lost in 44.1, understanding the theoretical limits of perception seems pertinent to advancing the state of the art in audio reproduction. So, that's my real motivation. Now, no need to point out that I am neither a scientist nor engineer, so I'm not going to personally advance the state of the art of anything. But I still think it's interesting.

I might point out however that nearly-raw square and saw waves are not uncommon in electronic music. One of my favorite examples, a track that is also actually quite useful for revealing problems with lossy compression or weak transient response: https://www.beatport.com/track/the-rub-off-original-mix/654977 ... it relies a lot on quick transients for the actual lead instrument, and so sounds like utter trash at low bitrates... before you say anything, I don't think having a 96Khz version would help :)

Your problem is that yes, even with 23 microsecond sample intervals digital audio can capture everything that happens below 20 khz. For starters if it begins and ends in less than 23 microseconds it is by definition at a frequency above 22 khz. If you are referring to changes in sound that happen between samples, yes as unintuitive as it seems, it does represent those too. You need to spend the 20 something minutes to watch the xiph.org video in Bigshot's sig. No it isn't theory, yes it really happens. Learn a little bit about how digital sampling works. For instance, you can have one tone playing and have another tone start exactly halfway between samples, and when recorded the reproduction will also preserve that timing including the second tone starting in between discrete sample points. If properly bandlimited, each set of samples can fit one and only one waveform. That includes things coming and going between sample points.

The business about 10 microsecond timing being audible is time differences between ears. Also unintuitive, but 16 bit 44.1 khz digital has a timing accuracy of 56 picoseconds. Less if dither is used. So the timing between two channels of 10 microseconds is nearly 200,000 times longer than the minimum timing of 44.1 khz 16 bit audio. You really need to figure out and understand this. This idea of the sample period being a limit to timing keeps coming back like a bad, bad penny. And it IS NOT CORRECT. It is definitively, demonstrably not correct. If you continue to insist otherwise, no one should take you seriously. If you really don't understand it, ask questions, we'll help you make sense of it.
 
Sep 28, 2017 at 11:08 PM Post #57 of 121
even without all this, a tailor made stimulus in complete isolation will never give a threshold value that's relevant for music content. it doesn't work for jitter and it doesn't work for this. different tests to answer different questions. it's the same idea as finding a friend in an empty field and finding him in a crowd. the later will tend to be harder and take longer. same with a weirdo signal in music.

because we're talking about music, another relevant problem has to do with what's available. reality isn't about 16/44 vs anything in the universe. it's usually a matter of 16/44 vs another band limited signal. if I take all my favorite albums, how many will have masters available in more than 24/96 of actual resolution? I'm not talking about jokes like HDtracks offering old damaged tapes and vinyls recorded at 24/192. crap in a big box is not highres anything.
with DSD we can assume that the sample rate is no issue, but in practice because of the the noise moved in the ultrasounds, band limiting still has to occur fairly soon. so even if we have somehow abilities that go even beyond 5µs and some album somewhere someday will be able to show that difference to most humans, right now the real question is, "does converting the master to 16/44 create audible differences?". and so far my tests said no as long as the DAC didn't have some audible signature roll off at 16/44, or other weird playback issues involving crappy resampling on the fly or whatever. aside from vague and avoidable side effects of using a different sample rate, I consistently failed to pass abx so far. it doesn't prove there aren't some albums out there that I would be able to identify as audibly different in their 16/44 version, but I don't care for the same reason I don't worry about being abducted by aliens. I can't claim it's impossible, but having spent a lifetime not noticing any, I assume I'll be able to keep on going like always.
 
Sep 29, 2017 at 10:34 AM Post #58 of 121
The business about 10 microsecond timing being audible is time differences between ears. Also unintuitive, but 16 bit 44.1 khz digital has a timing accuracy of 56 picoseconds. Less if dither is used. So the timing between two channels of 10 microseconds is nearly 200,000 times longer than the minimum timing of 44.1 khz 16 bit audio. You really need to figure out and understand this. This idea of the sample period being a limit to timing keeps coming back like a bad, bad penny. And it IS NOT CORRECT. It is definitively, demonstrably not correct. If you continue to insist otherwise, no one should take you seriously. If you really don't understand it, ask questions, we'll help you make sense of it.

I'll watch the xiph video, probably worth it either way. So, this is what I don't understand and hopefully you can clarify it for me. This was brought up earlier and the thread and phase differences being representable with that resolution makes sense. However, the paper I linked specifically mentions people being able to distinguish these timing differences in mono, maybe this is throwing me off.

Basically, what I understand (from an admittedly limited amount of reading, and I can't promise I haven't misinterpreted anything) is that humans can perceive differences between (something like) these two waveforms under certain conditions:

sPUQQOL.png
[/IMG]
sPUQQOL.png


But if you band-limited this to 44.1, you'd end up with a single waveform at 22,050hz, right? And so any timing difference between the pulses would just manifest as a change in amplitude in a 22Khz tone, right?

I'll watch the video and maybe delete this post later if it explains why this is wrong and dumb. :)

@castleofargh Yes, agree, I am also quite sure that this is not important to music reproduction, at least, not with any tech that exists today. But what if we build virtual reality for cats? They're going to be very unimpressed with our crappy time resolution... it will totally break the immersion...

@bigshot No arguments from me. Stereo recording is really a medium of convenience and legacy tech, as you imply, it's not exactly easy to recreate a true acoustic space with just 2 channels. Nor is it easy to reproduce an acoustic space with 6 or 10 mics fixed in location and artificially mixed together later... etc. It is an artistic medium with its own value, for sure - and no medium of expression will ever have a final, perfect, mode of use.
 
Last edited:
Sep 29, 2017 at 11:40 AM Post #59 of 121
I'll watch the xiph video, probably worth it either way. So, this is what I don't understand and hopefully you can clarify it for me. This was brought up earlier and the thread and phase differences being representable with that resolution makes sense. However, the paper I linked specifically mentions people being able to distinguish these timing differences in mono, maybe this is throwing me off.

Basically, what I understand (from an admittedly limited amount of reading, and I can't promise I haven't misinterpreted anything) is that humans can perceive differences between (something like) these two waveforms under certain conditions:

sPUQQOL.png
[/IMG]
sPUQQOL.png


But if you band-limited this to 44.1, you'd end up with a single waveform at 22,050hz, right? And so any timing difference between the pulses would just manifest as a change in amplitude in a 22Khz tone, right?

I'll watch the video and maybe delete this post later if it explains why this is wrong and dumb. :)

@castleofargh Yes, agree, I am also quite sure that this is not important to music reproduction, at least, not with any tech that exists today. But what if we build virtual reality for cats? They're going to be very unimpressed with our crappy time resolution... it will totally break the immersion...

@bigshot No arguments from me. Stereo recording is really a medium of convenience and legacy tech, as you imply, it's not exactly easy to recreate a true acoustic space with just 2 channels. Nor is it easy to reproduce an acoustic space with 6 or 10 mics fixed in location and artificially mixed together later... etc. It is an artistic medium with its own value, for sure - and no medium of expression will ever have a final, perfect, mode of use.


I like to keep it simple and use to have in mind the following values (maybe wrongly?);
  • in air at @20*C sound waves travel at a speed of 340mps (meter per second). In liquid more or less 4 times faster....
  • latency from hear to brain is in the ms range in average from 5ms to 20ms ( depending on frequency/duration/level of stimuli)
  • transients with duration less than 20 ms can not be properly differentiated.
  • Minimum Interaural Timing Difference ITD is reported to be around 10us but at frequency tones around 1kHz
  • 44.1kSps 16 bit allow a theoritical timing resolution of 56ps (when a simple linear interpolation is used , much more and in principle perfect with a infinite sinc interpolation)
For better understanding:
  • do not mix digital sample rate (1 sample every 23us) and the analog output after data interpolation ( continuous signal )
  • ITD=10us is the minimal timing difference between Left Ear and Right Ear wave perceived by human brain under specific conditions.
  • At higher frequencies ITD is in the ms range and human brain rather uses interaural level differences ILD for sound localization
Hope it helps and forget the marketing about timing resolution.
 
Last edited:
Sep 29, 2017 at 12:52 PM Post #60 of 121
But if you band-limited this to 44.1, you'd end up with a single waveform at 22,050hz, right? And so any timing difference between the pulses would just manifest as a change in amplitude in a 22Khz tone, right?

Such pulses have spectral energy down to 0 Hz and the gap between them affect the spectral shape also under 20 kHz. I made a quick test in Audacity (see below). I made 1-0-1 and 1-0-0-1 impulses at 192 kHz and band limited those those to 44.1 kHz. The resulting band limited impulses are a bit different, so increasing the gap from 5.2 µs to 10.4 µs means different results in the 44.1 kHz version too. A lot of the difference is "coded" into the pre- and after oscillation. The look of the waveform isn't everything. The spectral shape is important too so even if you can't see the gap in the band limited wave form, it shows in the spectrum, down to 0 Hz.


pulses.png

Upper signal: Original @192 kHz sample rate
Lower signal: Bandlimited to 44.1 kHz (note the scale!)
 

Users who are viewing this thread

Back
Top