Last Summer and Autumn has been an incredibly exciting and busy journey for me; and I couldn't talk about it. Now that the Blu mk2 has been launched, I now am at liberty to talk about it - its been somewhat frustrating not being able to talk about it, as the improvements have been very exciting for me.
Here are the slides from the technical presentation I put together for the press, together with some extra notes explaining things more. Firstly apologies - this is complicated stuff, and not easy to get across.
OK this slide should be familiar to anybody that is aware of the WTA filter. What I am trying to say here is that a DAC job is not about reproducing the sampled digital data - but about recovering the signal before it was sampled, i.e. the original analogue signal in the ADC. The interpolation filter has the job of recovering the original analogue signal, as it is at the heart of the process of converting from a sampled signal back to a continuous signal. I am also talking about how essential that timing of transients is too the ear and brain; if there are errors in the timing of transients (a signal's transient edge maybe too early or too late compared to the rest of the signal) then this error has massive consequences for the brain to understand the audio, and this affects the ability for us to perceive the starting and stopping of notes, the perception of sound-stage, timbre and even bass pitch (transients are used as a cue to perceive bass pitch for example - timing errors on transients mean you can not follow or perceive the bass tune).
So here I am talking about sampling theory. Basically, if you use an interpolation filter that had a sinc impulse response, then you would
perfectly reconstruct the bandwidth limited waveform before it was sampled - so the sampled data would have all the missing bits in between perfectly represented - there would be absolutely no difference whatsoever from the original. So it would not matter whether you were sampling at 22 uS, or 22 fS, the bandwidth limited signal would be absolutely identical in both cases.
But to do this we of course need the analogue signal to be perfectly bandwidth limited (that's the job of the ADC - and this actually is not technically difficult to do, Davina for example will have > 200 dB of bandwidth limiting), but from the DAC's point of view we need an interpolation filter that infinitely oversamples and has infinite ringing. To do this we would need a sinc FIR filter that had an infinite number of taps; something clearly impossible to do. So to cope with a limited number of taps, I found that changing the algorithm (this determines the values of the coefficients, and that sets the shape of the impulse response, or how the ringing is defined). So hence how and why the WTA filter was created.
So I can see that maybe you are asking the question - the ideal filter rings and rings and rings for an infinite period of time, and it looks nothing like the original impulse, which is zero, a pulse, then zero. So how can one possibly say that this filter returns the original signal completely unchanged? After all everybody says ringing is bad and unnatural.
So how do we answer this paradox? Its easy - the original impulse is
NOT a legal signal as it is not bandwidth limited. An impulse for CD has exactly the same level at 22.05 kHz as at DC - but sampling theory absolutely demands that the signal be bandwidth limited, and this means that at exactly 22.05 kHz and above the signal has exactly zero output - not the full output that a impulse supplies. So all this talk about ringing is fundamentally mistaken as you are basing a prejudice on a signal which a competent ADC would
never be capable of supplying.
Here is the key idea - take an ideal impulse, then bandwidth limit it, so that for CD the level at 22.05 kHz and above is exactly zero, then use this as your test signal. What would happen here is that ideal filter, which rings and rings and rings with the illegal impulse, would return absolutely no difference from the un-sampled bandwidth limited impulse. And cruder filters that have short ringing will actually produce more ringing, and more changes to the original signal! So the idea that ringing of filters is bad is based on a false premise, and people simply not understanding the theory properly.
The next slide covers the WTA history - I have been talking about long tap length filters for a very long time:
So this sees the history of the WTA filter, and I will be coming back to this later.
So this talks about the new FPGA that only recently became readily available in production quantities. Now actually this design using this device actually came from the Davina ADC project, so it was easy for me to drop it in from the Davina project. So of course the secret of tap length is now out but:
The next slide actually talks about the relevance of this:
Now what do I mean about 16 bit accuracy? When you look back at the sinc function, to absolutely guarantee reconstruction to 16 bit accuracy you need the ideal sinc function coefficients to be smaller than 16 bit - and to do this, for a 16 FS filter, then you need getting on for a million taps. Now I have history on this idea. Roll back to the early 1980's when digital recordings started to appear on vinyl - and they sounded awful - hard and flat. I was at university, and was studying Electronics as a science, so we did much more theory than usual, and sampling theory was a big chunk. Now my private studies included trawling through the psychology library, and reading up on the science of hearing and perception. A common strand was the importance of transients for perceptual cues - and I learnt that it affects everything to do with enjoying music, so was crucially important.
But understanding sampling theory I immediately recognised that the interpolation filters at that time (initially NOS passive filters, then replaced with simple FIR filters) would have massive problems in reconstructing the timing of transients. I thought of it as a audio uncertainty principle (like Heisenberg uncertainty principle from quantum mechanics) when passed through a crude filter you would have uncertainty on the timing of transients that occurred in-between the samples. I actually later learnt there is a Fourier uncertainty due to windowing functions - and a window function is exactly what you do to make a finite FIR filter. By having longer tap lengths you reduce the Fourier uncertainty - exactly as my intuition told me.
Anyway, its was easy for me to calculate what you needed from an FIR filter to guarantee 16 bit performance under all conditions, and you needed getting on for 1M taps. Now in the early 1980's, DSP or FPGA devices didn't exist, the micro had only been invented a few years earlier. So the idea that 1M taps would ever be available or doable was clearly insane; its seemed like inventing a ship to take you to Alpha Alpha Centauri in a day would be more likely. So my reasoning was that it is never going to happen, and so digital was fundamentally flawed....
Now getting back to designing the filter. I have had immense problems with designing long tap length WTA filters before, and 1M taps seemed too much for the A200, so I settled on 512,000 taps, as it could be easily done. You don't want to spend months coding and designing a filter only to find that the FPGA won't work with that design.
So I got it designed, verified, got the FPGA to pass the design and meet timing closure.
Then after spending a few months over the summer I got to listen to it. Now expectations were not strong; I knew it would sound better, but I was expecting it to be just an incremental improvement.
I was very wrong.
It was massive - one of those transformational things one hears rarely.
Oh dear. This actually was a problem, as if it was just a bit better, then more taps would not benefit much. It was a problem, as we had penciled in October to launch Blu mk 2. And increasing tap length would mean forgetting that, as more tap lengths meant completely designing a brand new filter with new architecture.
I really wanted to go for 1 million taps too, as this has been my 35 year dream. To get 1M, I had to solve two problems - meeting timing, and actually using the memory. Now there was (just) about enough memory, but when you actually use the memory, not all of it is available - so I had to figure a way to improve memory efficiency. But it was also risky, as spending 3 months more may mean it would never work.
What to do? Delay the project with something that may never be achievable? And what about all the other things I had to do (Davina, Hugo2, dig amp...) - they would get delayed.
I decided to bite the bullet, and do what I wanted top do - which is always about performance.
And December I finally got it finished - and was frankly richly rewarded. Now I do not want this to sound like some advert for it - that's not my intention and forgive me if it comes across that way - but the change in sound is not small. Everything benefits - clarity is the first thing that hits you - then focus and depth, then more refinement and timbre variation. And I can't get it out of my head as to why it's so massive - but relating technical errors to sound is very interesting - and the ear/brain is amazingly sensitive.
So the next slide is the internal architecture of the new design:
So this gives you a flavor of the well over half a million lines of code that goes into this and the complexity of it.
Finally is a discussion about Dave, and why I can't easily drop this FPGA into a DAC:
Now depth perception is a major thing with Dave - and that is down to resolving small signals absolutely perfectly, with no changes in amplitude. Having a chip next to the pulse array elements injecting 10A peak of weakly signal correlated current would destroy that ability, so using a separate M scaler would be easily the best sounding option.
The USB M scaler is actually Davina. I knew at the start of the Davina project that M scaler mode would be an important mode of operation for it; indeed, the Blu mk2 project could only be justified because of the future release of Davina. After all, a USB input is what I really need personally - but Blu has a SPDIF BNC input, so that is how I currently listen to my newly discovered music collection - yes it is that profound a change.
What is also odd is that older CD recordings sound very much better - now my high res downloads are also better, but not like 16 bit. Indeed, my feelings are that sound quality is now almost completely dependent upon the recording process and not the actual sample rate used. And the 1960's Decca's and Mercury recordings - well some of them has been transformed. I am really amazed at the quality of these recordings - now they sound obviously a bit distorted, and are noisy - but they do things that modern recordings at 192 kHz fail to do - and that is reproduce speed and impact of dynamics, and variation in timbre. In short, these old recordings have life and vitality - and this has become more apparent to me with the M scaler.
Why would that be so? Well simply put - this is the first time that 16 bit has actually been reproduced to guarantee 16 bit accuracy of the timing of transients. So for the first time we can actually perceive what 16 bit is actually capable of. But where does that now leave us? How many more taps do we need? How close is 1M to ideal?
That is the question I aim to answer with the Davina project - I will make 768 kHz recordings, decimate it down to 48 kHz, M scale it back up to 768 kHz, to create another file. So we will know for sure what these losses actually represent subjectively by comparing the two files.
2017 should be very exciting.....