Hi-Res Audio, DSD and placebo effect??
Sep 19, 2017 at 7:22 PM Post #31 of 121
Which drum samples? Can you point to one that has a transient peak that rises at a 44,000th of a second?
 
Sep 19, 2017 at 7:23 PM Post #32 of 121
I'm not talking about theory. I'm talking about the real world. There is absolutely nothing in music that even approaches a transient that doesn't span dozens and dozens of samples if not hundreds. It's important to have a general idea of what numbers represent. Use horse sense- just ballpark it and conceive of the time in your head- divide a second into 44,100 parts. Now find something in music that is faster than the fastest shutter speed on a camera. Not with acoustic instruments for sure. Even electronic instruments is unlikely. Now find something an order of magnitude faster that and you'd be talking about a one sample transient. Good luck!

Right, all fair points, but, I am not really worried about real-world behavior. I'm wondering whether 44.1/16 is literally beyond the limits of human hearing, even in theory, or not. I'm well aware neither 99.9% of music, nor 99.9% of gear, can max out 16/44.1, but it's fun to speculate about the edge cases.
 
Sep 19, 2017 at 7:24 PM Post #33 of 121
Which drum samples? Can you point to one that has a transient peak that rises at a 44,000th of a second?

At home I have a fair number of synthetic drum samples that just use a click with a rise time of one sample as their "snap". It's kind of common among old crappy samples from old crappy digital drum machines. I will see if I can find some good examples and post them here. It's not hard for them to be this way since they were artificially created in the first place.
 
Sep 19, 2017 at 7:27 PM Post #34 of 121
Note this factoid about the speed of bullets. It talks about .2 seconds being twice as fast a reaction time as an olympic runner. How fast can someone hit a drum with a stick? .2 seconds is 1/5th of a second. Now imagine what 1/44,000th of a second feels like.

Bullets. The average bullet travels at 2,500 feet per second (around 1,700 mph). If you reacted to the sound of the gun going off and required 0.20 seconds (twice that of the fastest Olympic sprinters) to react, then you would need to be at least 500 feet away to successfully dodge a bullet.
 
Sep 19, 2017 at 7:36 PM Post #35 of 121
Been there done that. Had a DSD recording device. Did ABX with lower resolutions. Could not tell which one was which. Sold DSD recorder.

But when I changed my speakers, oh my, what a huge difference that made.

I say: Put your money into headphones, speakers.
With speakers however, don't expect magic if your room acoustics are bad, like an echo chamber, a tiled bathroom... without some kind of room treatment, even the most expensive and most raved about speakers will still sound like crap.
 
Sep 19, 2017 at 8:01 PM Post #36 of 121
Don't really dispute any of the above. However, I'm really just interested in whether the familiar arguments about 16/44.1 are actually 100% true. It's always asserted with such finality that I could not help but wonder if there was some small exception. And, since the argument (for me anyway) actually is about digital representations and not reproduction (which, at that point, just forget it... I can't afford speakers that are flat to 40Khz anyway), it's pertinent to consider impulses. Side note: A lot of electronic drum samples, actually, include what amounts to a single sample impulse.

Don't really dispute any of the above. However, I'm really just interested in whether the familiar arguments about 16/44.1 are actually 100% true. It's always asserted with such finality that I could not help but wonder if there was some small exception. And, since the argument (for me anyway) actually is about digital representations and not reproduction (which, at that point, just forget it... I can't afford speakers that are flat to 40Khz anyway), it's pertinent to consider impulses. Side note: A lot of electronic drum samples, actually, include what amounts to a single sample impulse.

Right, all fair points, but, I am not really worried about real-world behavior. I'm wondering whether 44.1/16 is literally beyond the limits of human hearing, even in theory, or not. I'm well aware neither 99.9% of music, nor 99.9% of gear, can max out 16/44.1, but it's fun to speculate about the edge cases.

At home I have a fair number of synthetic drum samples that just use a click with a rise time of one sample as their "snap". It's kind of common among old crappy samples from old crappy digital drum machines. I will see if I can find some good examples and post them here. It's not hard for them to be this way since they were artificially created in the first place.
So what you're asking/saying is that it is possible for a synthesized signal to exceed the ability of 16/44.1 to record and reproduce it accurately?

Of course it is. Heck, that system can't even produce a theoretically perfect 1kHz square wave. But that completely misses the point. Any system can be forced into an area in which it cannot operate well. When you say your argument is about digital representations and not about reproduction you've gone headlong into that theoretical area in which imperfections abound. For example, even 24/192 cannot produce a perfect 1kHz square wave. No audio recording system can. Is that important to...um...reproduction? See, if you don't use sound reproduction as your metric, then you have no frame of reasonable reference. This kind of snag exists everywhere in science. But practical application is really all that matters. Can we reproduce a synthetic 1 sample pulse perfectly at 16/44.1? Not after the filter, no. Does it matter? No.

An engineer, a mathematician, and a physicist are each presented with a beautiful woman and the stipulation that at each time interval, they may move half of the remaining distance towards her.

The mathematician points out that the distance will never reach zero, and walks away in disgust.

The physicist opines that if each iteration requires a finite amount of energy then the energy expended in the approach will be inversely proportional to the distance remaining and gives up on the spot.

The engineer says "8 feet, 4 feet, 2 feet, 1 foot, 6 inches....close enough for practical purposes".

A perfect square wave exists only in theory. Can we come close enough to reproduce the sound of a 1kHz square wave at 16/44.1? Absolutely. How about one at 15kHz? Not a chance. Does it matter? Nope.
 
Last edited:
Sep 19, 2017 at 8:12 PM Post #37 of 121
So what you're asking/saying is that it is possible for a synthesized signal to exceed the ability of 16/44.1 to record and reproduce it accurately?

Not really. My concern is more like, if 16/44.1 is settled on as "more than enough" as a format for storing recordings, (forget any question about reproduction), then my opinion is that it ought to be able to store just a bit more audible information than humans could ever possibly perceive. This way, it's really "future proofed" against some imaginary sci-fi perfect transducer, or whatever. Like, if it's meant to be a perfect storage medium for audio recordings, is it *actually perfect* or just close?

My point is just that there seems to be certain edge case signals that humans can hear, that can't be represented in a 16/44.1 format. So, it's not a "perfect" format as such.

Again, I fully acknowledge the uselessness of this question in real life scenarios.
 
Last edited:
Sep 19, 2017 at 8:17 PM Post #38 of 121
At home I have a fair number of synthetic drum samples that just use a click with a rise time of one sample as their "snap". It's kind of common among old crappy samples from old crappy digital drum machines. I will see if I can find some good examples and post them here. It's not hard for them to be this way since they were artificially created in the first place.

Haven't seen those. Even then, you only hear the below 20 khz portion, and regular digital passes that part of it.

Another bit about those cymbals. Look at very zoomed in views of the waveform sampled, even though struck sharply cymbals don't zoom straight to max amplitude. Cymbals have a high frequency resonance which is what gives them their intensity. It takes a few cycles for the energy of the stick on the cymbal to resonate from edge to edge and build in energy. So it isn't the immediate zoom to max level people picture in their mind or the way it can sound.
 
Sep 19, 2017 at 8:44 PM Post #39 of 121
Cymbal comparison.png


Okay what you are seeing are the recordings of the cymbals I mentioned. I would credit the site with them, but forgot while I kept files I downloaded. Recordings were done at 176/24. The response is above the noise floor to 60 khz. I seem to recall these mics had 40 khz rated bandwidth (might have been Earthworks omnis).

Look at the top two especially. The top one is the original file. The second one I downsampled to 44 khz so there is no response above 22 khz. Then upsampled to 176 so you could compare. You can see there is a little loss in steepness and height in the downsampled version. But the basic bulk of the waveform is hardly touched. It is also worth noting how you see the basic wave start very low and build several cycles before reaching high amplitudes. Real cymbals are nothing like one sample impulses.

The lower two are some other cymbals which has some of the highest resonant frequencies. It runs off the graph to the right, but these also for some reason take more cycles to reach maximum amplitude. These also were cherry picked off the website. I kept a half dozen of the fastest reacting cymbal recordings on that site.
 
Sep 19, 2017 at 11:54 PM Post #40 of 121
Not really. My concern is more like, if 16/44.1 is settled on as "more than enough" as a format for storing recordings, (forget any question about reproduction), then my opinion is that it ought to be able to store just a bit more audible information than humans could ever possibly perceive. This way, it's really "future proofed" against some imaginary sci-fi perfect transducer, or whatever. Like, if it's meant to be a perfect storage medium for audio recordings, is it *actually perfect* or just close?

My point is just that there seems to be certain edge case signals that humans can hear, that can't be represented in a 16/44.1 format. So, it's not a "perfect" format as such.

Again, I fully acknowledge the uselessness of this question in real life scenarios.
of course you can find situations where people can notice a difference.
some very young child who's able to hear above 20khz, if tested with high ultrasonic content will probably tell the resolutions apart.
we've all had a few experiences with DAC that roll off gently but to soon, with a track that has a good deal of top end signal you notice the roll off.
and there are probably many other occurrences where the gears used manifest an audible difference for many reasons. many people have reported getting better sound from oversampling their file on some computer and DAC(not a lot of controlled data though).

now actual cases showing that insufficient sample rate is the cause for audible change like missing part of a transient response on 16/44. I don't know of any.
we have the people who look at the signal, see that 16/44 rounds up some transients in audacity because of the band limiting, and they go:
-we can hear transient
-transients are changed with 16/44
therefore we can hear 16/44 ruining the transient.

of course it's a fallacy because those guys never tested which part of the transient response they were really hearing in the first place. and surprise surprise, proper blind tests aren't exactly piling up with positive discrimination of 16/44 vs highres.

the other famous argument is about the smallest time cue we can notice, and they will bring up interaural timing and how humans can notice as low as 7 to even 5µs events. and they go:
-we can hear 5µs
-44.1khz has 22.7µs between samples
therefore 44.1 isn't enough to be audibly transparent for music.

of course it's yet again a fallacy because as mentioned in some posts above, the interval between samples does not define the time resolution of the signal. also, go have a look at the test signal needed to obtain those 5 or 7µs ^_^. spoiler, you won't find it in your favorite albums. there is also the question of when 5µs accuracy between channels is relevant in albums? we're mainly dealing with mono tracks panned by hand. a technique that cares so much about how a sound source on the left will hit the right ear with a delay and making that delays accurate within 5µs, that it doesn't do anything to the delay and just changes the loudness.^_^

ultimately I gladly accept that a rational can be bollocks like those, and still support a true idea. but testing consistently showed how all those ultimate numbers explode when the human is confronted to musical content. and the common failure in 16/44 vs highres blind tests is not really indicative that we should care. it does not mean we've really reached a limit, as maybe there is one song somewhere that even I could pass. and I understand the paranoia of elite audiophiles who only wish for the best. but I'm tempted to adopt my usual behavior on that matter, falsifiable or it didn't happen.
 
Sep 20, 2017 at 3:41 AM Post #41 of 121
Not really. My concern is more like, if 16/44.1 is settled on as "more than enough" as a format for storing recordings, (forget any question about reproduction),
...but....if you're going to forget about reproduction then it doesn't matter how it's recorded at all. You must consider reproduction or the recording process just makes no sense at all.
then my opinion is that it ought to be able to store just a bit more audible information than humans could ever possibly perceive. This way, it's really "future proofed" against some imaginary sci-fi perfect transducer, or whatever.
The big problems in sound reproduction have entirely to do with the transducers and the acoustic environment. It's not a matter of a transducer not being able to produce ultrasonic signals, there are some that can. It's getting those signals to your ears that is the problem. They become highly directional unless diaphragms become very tiny, at which point they also become inefficient. With today's tweeters that go above 20kHz, their dispersion patterns look like narrow flashlight beams that rapidly narrow as frequency goes up. Aiming is critical, and listener head position is critical. As to headphones, the acoustic environment above 20kHz is radically different for each listener because of their ear shape, so getting response that is in any way consistent is problematic at best. Even with IEMs, the wavelengths involved are strongly affected by the shape of the cochlea, which isn't the same for everyone. It becomes clear when looking at the ear design that it wasn't intended to work well, if at all, above 20kHz.

I still don't quite know why you'd want to store more than anyone could ever possibly hear, but that can be done now if you really want to, just not at 44.1.
Like, if it's meant to be a perfect storage medium for audio recordings, is it *actually perfect* or just close?
Nothing made by man is perfect. Nothing. It's just close enough to be acceptable and useful. It's a powerful shame the initial CD marketing used such superlatives as "perfect" and "forever", but those of us inside the industry who experienced that marketing the first time around blew it all off as marketing anyway, knowing full well none of it was actually true.
My point is just that there seems to be certain edge case signals that humans can hear, that can't be represented in a 16/44.1 format. So, it's not a "perfect" format as such.

Again, I fully acknowledge the uselessness of this question in real life scenarios.
Those edge cases are few, specific, and non-musical.

Since we've had 192kHz for quite a few years, yet there is still not clear and definitive data about that making an audible difference over 44.1, it seems your quest for "perfect" is just one of semantics, and focussed on a area of imperfection with possibly the least audible impact. Now, redirect that quest to the transducer, make that better, and you'll have spent time on something worth while. Budget the extra bits on capturing a 3D sound field, not the inaudible octave. That's the old 5.1 vs stereo argument. For a given recording channel bandwidth every listener can hear the difference between stereo and 5.1, few, possibly none can hear that same bandwidth used to carry two channels with bandwidth beyond ultrasonic.

Preference and artistic usage is a different problem entirely. Some like to say that two channels is somehow pure, which couldn't be further from the truth. Others take exception with surround perspectives. But these are just tools, and the goal is to produce an entertaining sonic experience that somehow is believable, either by simulating the original acoustic event or by producing one that is completely artificial. Ultrasonics play no role in any of that at all.

The other problem is that research has shown humans don't like perfectly flat frequency response very often. Headphone response follows a radically non-flat target curve. Speakers a much less radical curve, but usually not perfectly flat in every room. What do we do about a target curve for 20+kHz? It can't be percieved, so we don't know if it should roll off, tilt up, be flat to some arbitrary frequency or not. Does the curve change with volume (like below 20kHz does)? If so, how? The problem remains, if you can't hear it, how do you even start? So the reaction goes to capturing 100kHz of bandwidth. But the mics we want to use don't do that, and the ones that do sound lousy in the audible range. The instruments being recorded don't do that. Even the synthesizers we now have are digital with a fixed sampling frequency and filter, so they don't do that either. Then we circle back to the artificially produced impulse, or test signal like a square wave, and complain because our recording system can't perfectly produce it.

In the entirety of audio there has not been a technically perfect recording system yet. But, there also is no perfect reproducing system. Of the two, the recording channel at 16/44.1 can record and reproduce the signal at it's input to an extremely high degree, but the reproducing system of speakers/room or headphones is miles away from presenting anything like the original acoustic wave to our ears.

Pick the thing you want to work on. I choose the thing that will make the most difference.
 
Sep 20, 2017 at 5:20 AM Post #42 of 121
Not really. My concern is more like, if 16/44.1 is settled on as "more than enough" as a format for storing recordings, (forget any question about reproduction), then my opinion is that it ought to be able to store just a bit more audible information than humans could ever possibly perceive. This way, it's really "future proofed" against some imaginary sci-fi perfect transducer, or whatever. Like, if it's meant to be a perfect storage medium for audio recordings, is it *actually perfect* or just close?

You can plow through the confusing details of the previous responses, or you can just take my response "Yes, it is audibly perfect." Chase down rabbit holes or don't. It's your choice.
 
Sep 20, 2017 at 6:36 AM Post #43 of 121
Not really. My concern is more like, if 16/44.1 is settled on as "more than enough" as a format for storing recordings, (forget any question about reproduction), then my opinion is that it ought to be able to store just a bit more audible information than humans could ever possibly perceive. This way, it's really "future proofed" against some imaginary sci-fi perfect transducer, or whatever. Like, if it's meant to be a perfect storage medium for audio recordings, is it *actually perfect* or just close?

16 bit / 44.1 kHz is actually more like "just enough". The nyquist frequency 22.05 kHz is so close to the upper end of human hearing (flat response to 20 kHz is desired), that anti-aliasing and reconstruction filters are a bit tricky (steep as hell), but manageable (oversampling allows more relaxed reconstruction filters).

My point is just that there seems to be certain edge case signals that humans can hear, that can't be represented in a 16/44.1 format. So, it's not a "perfect" format as such.

Human hearing filters ultrasonic signals strongly just as anti-aliasing filter do for 44.1 kHz digital audio. Also, only children can hear 20 kHz frequencies, so one could say 44.1 kHz digital audio is able to represent signals adults can't hear. Even in the case of children the sound pressure level of a 20 kHz tone must be very high to be heard. So even if you can hear 20 kHz frequencies, it is unlikely the music contains strong enough signals at so high frequencies to ever been heard.

Sharp impulses and "staircase" signals are mathematical conceptions that assume infinite bandwidth. The real life is another story. Even the most impulse-like signals spread in time having a finite raise time (physical system stores energy) and decay time (physical system releases stored energy). Even our ears do that. Real life impulses resemble sinc-function.

Again, I fully acknowledge the uselessness of this question in real life scenarios.

Please don't be so modest. These are good questions and someone how doesn't know that much about digital audio may learn something reading this thread.
 
Sep 20, 2017 at 7:09 AM Post #44 of 121
For example, if you actually wanted to repeat this two-click threshold test done in the 70s (I found the link) http://asa.scitation.org/doi/abs/10.1121/1.1912374 you would not be able to do so with digital equipment running at 44.1Khz. The shortest click you can represent at that sampling rate is... 22 us.

If there is some audible difference between clicks lasting 20 and 10 microseconds, then we must conclude that people CAN hear (under specific circumstances) more than 44.1 can reproduce. I won't argue that this really matters for music, but if we want a standard that theoretically exhausts human ability, we may want to consider that 16/44.1 isn't it.

My point is just that there seems to be certain edge case signals that humans can hear, that can't be represented in a 16/44.1 format. So, it's not a "perfect" format as such.

I may be wrong, but I'm not sure you can draw such conclusions from those experiments. Even that abstract says:

"It appears that discrimination of slight changes in the energy spectrum of the two transient signals, especially in the high‐frequency region (8000 Hz and above), underlies the ear's sensitivity to a temporal discontinuity."

8000 Hz and above

8 kHz is obviously within capabilities of 44.1 kHz sampling rate. My understanding is that they didn't hear those 10 usec clicks "directly", but only the effects that they produced in the audible range. So, yes, it is not possible to "encode" 10 usec clicks in a single channel using 44.1 kHz, but it should be perfectly fine to "encode" the effects of those clicks which allow humans to distinguish them from 20 usec clicks.

Or in other words, if you have those hires files with 20 and 10 usec clicks that you can distinguish, then if you low-pass them and downsample to 44.1, then they will still sound the same and you will still be able to distinguish them.
 
Sep 20, 2017 at 7:40 AM Post #45 of 121
At home I have a fair number of synthetic drum samples that just use a click with a rise time of one sample as their "snap". It's kind of common among old crappy samples from old crappy digital drum machines. I will see if I can find some good examples and post them here. It's not hard for them to be this way since they were artificially created in the first place.
Yes, of course you can have drum samples of this sort*, but the drum sample is just "data" that could be turned into ASCII-text, control signal of a robot or a picture file. Or sound in this case. When you playback that sample, the DAC of your system uses reconstruction filter to create analog signal from the sample. The analog signal has limited bandwidth and depending on the phase response of your DAC has more or less pre-ringing + decay ringing. "Digital snaps" never reach your ear and it's not intended to happen. Sample snaps are digital information, not real sounds. Real analog sounds are created from that information and always band limited.

* such artificial sample might be "illegal" in the sense, that it's not a digital representation of any band limited signal. Digital audio deals with band limited signals. If you violate that principle, things go wrong.
 

Users who are viewing this thread

Back
Top