Here is my presentation from CanJam:
So here I am talking about how important timing of transients is to perception - the implication being is if transient timing has errors (transients being too early or too late, then this will degrade lateral imagery, or left to right instrument placement.

This illustrates that the brain relies on transients of bass notes in order to infer the pitch. Again, the implication is if there are timing errors then the brain will be confused and won't be able to perceive bass notes.

Again psychoacoustic studies have shown that transients are important information for the perception of timbre. Again, the implication is if transients have timing errors, then timbre perception will get degraded.

So I have talked about the fact that transients are important, and now it's about the specific timing problem that DAC's introduce.
So this is a simple illustration of the problem of reproducing transients, and why it is conventional interpolation filters create very large timing errors. So we can see here a sine wave tone burst, and the digital data actually loses the initial burst, as it samples it at the start of the transient.
So the top waveform will be the output from the "audiophile" filters - that actually create timing errors, as in this case we have a peak error of full amplitude - so the reconstruction gets the sign right, but nothing else. So in this particular circumstance, the OP is only 1 bit accurate. But the second waveform is perfectly reconstructed - and if we used an ideal sinc function filter (known as a Whittaker-Shannon interpolation filter) then it will perfectly reconstruct it, with no errors, and in particular no transient timing errors.
You may ask if the audiophile filters have such huge problems, then why do people like them? The issue is simple; when you get these huge transient errors, the brain can't make sense of the audio; when it can't make sense of something, it draws a blank, and you can't actually perceive the transients. And when you can't perceive transients properly, then things sound softer - it's the equivalent of making the image go out of focus, as you can no longer perceive sharp details. You can't follow the pitch of the bass, and it sounds big and fat. But of course the image goes big and flat too, as imagery is degraded; and instruments sound similar with poor timbre variations.
So this covers the theoretical ideal - if we use a sinc function filter, it will perfectly reconstruct it, with no transient timing errors.
But to implement an ideal sinc function interpolation filter, we need an infinite amount of processing... Something not possible!
So we can improve on conventional algorithms ability to reconstruct transients correctly - and this led to my WTA interpolation algorithm. It has been optimised to reduce the error, with a finite amount of processing (or taps). Actually, the WTA algorithm has evolved as tap length has increased, and today it is very similar to an ideal sinc function - over half the coefficients are identical to the ideal sinc function. A windowing function is the process that is used to tailor the sinc function from an infinite response into a small finite one, of a size we can actually use.
So this links history, and shows the relationship between sound quality and tap length with the WTA filter.
Some personal history - I have been waiting for this for a long time!
The importance of 1M taps is that I use identical to sinc coefficients to a 16.6 bit accuracy - this means that since it is the same as an ideal sinc function, and it's only a sinc function that will perfectly reconstruct, then we are guaranteed to reconstruct the bandwidth limited analogue signal to a better than 16 bit accuracy, with all signals, under all conditions. So your 16 bit file is now being perfectly reconstructed - at least to better than 16 bits accuracy and to 705.6 kHz (1.3 uS sample period).
I needed to create a new filter architecture in order to do the M scaler - 528 DSP's operating together, with half a million lines of code. This gives you a flavour of the complexity. Fortunately, I can verify exactly that the filter works perfectly, although it took some months to test and de-bug.
Features of the Hugo M scaler. I added the pass through mode so it was easy to be able to hear the difference.
So just to summarise. This whole concept is actually very very simple in reality:
1. We know that transients are essential from a perception POV.
2. Only an ideal sinc function interpolation filter will reconstruct the bandwidth limited analogue signal perfectly.
3. All other filters will create differences from the original analogue signal.
4. These differences will make transients come a little early or a little late, creating audible problems. The differences in timing depends upon when it was sampled, the past and future signals; so transients are randomly coming backwards or forwards; it's this constant change in the timing of transients that confuses the brain, as it can't process the audio to extract placement, timbre, pitch and tempo information all things that are essential for musicality.
5. Every time I have doubled the tap length and so increased the accuracy to an ideal sinc function by 1 bit I have noticed improved sound quality, as the transients are being more closely reproduced to the original. And the sound quality changes are consistent with what we know about transients and perception.
6. The M scaler is the same as an ideal sinc function filter to better than 16 bits; this means the bandwidth limited signal is reconstructed to a better than 16 bit accuracy.
To me, the M scaler has made a huge difference to sound quality when assessed objectively; much more importantly it has transformed my own enjoyment of music.
Rob