Watts Up...? | Page 134 | Headphone Reviews and Discussion - Head-Fi.org

Rob Watts · Nov 11, 2020

My advice over many years has been not to use up-samplers, but to always feed my DACs with bit perfect data, as the WTA algorithm would always do a much better job than conventional algorithms in terms of recovering transient timing more accurately and hence better sound quality and musicality (musicality being defined as being able to get emotional with the music). In the past, that advice was based on two factors - much longer tap lengths of the FIR filter, and a better algorithm (the WTA). Today of course there are solutions that claim long tap lengths too, so the recommendation on not using up-samplers is losing out on one benefit.

Moreover, as the tap length increases, the SQ should converge - an infinite tap length filter would sound identical to pure sinc, to rectangular sinc (tap length infinite -1 for setting window to 1) and Kaiser filter, and the WTA. This assumes that the filter is designed appropriately - scaled properly - I once saw a 1M tap length filter where they had just taken a normal filter and increased the taps so it had 99% of coefficients that were near zero. He wondered why it didn't sound any different... So a key question is are we getting close to convergence at 1M taps?

So I thought I would do two things - explain in more detail about what the WTA algorithm is all about, and to design a couple of filters that use conventional algorithms but also has the same 1M tap length, so I can listen to WTA against other algorithms, and measure their performances too. Measurements do provide one indication as to how badly they will sound (but not necessarily how good - more of that later).

But I need to explain first how an FIR interpolation filter is constructed. FIR means finite impulse response, and the other type of filter is the IIR or infinite impulse response. IIR filters are like analogue filters, as they can't see into the future; FIR filters delay the incoming data into a sample buffer, and hence can "see" into the future. This means that an FIR filter can be made linear phase with a symmetric impulse response. A FIR filter that is infinitely long and has a sinc (sin(x)/x) impulse response will perfectly reconstruct a bandwidth limited signal - so if you are interested in transparency, then a sinc function FIR filter is the only way to go. But a pure sinc function needs an infinite amount of taps (one tap is one sample multiplied by a coefficient and fed into an accumulator to sum the result) and so we need to modify the sinc function to make it smaller so it can be processed. The most popular way of doing this modification is via something known as a windowing function. The WTA algorithm is a unique windowing function designed to maximise the accuracy of recovering transient timing from the original analogue signal.

There are many types of windowing functions, and you can see an excellent discussion on Wikipedia here. The simplest windowing function is the rectangular window; you simply multiply the sinc values with 1, and when you want to stop the filter you set it to 0. This graphic from Wikipedia shows it here:

2880px-Window_function_and_frequency_response_-_Rectangular.svg.png

The benefit of the rectangular window is that all of the filter values are sinc - but the downside is we have a rapid transition from 0 to a sinc coefficient - and this abrupt change (or discontinuity) causes massive problems in the final filter. No competent audio filter designer would ever use a rectangular window because of this.

A much better windowing function, and in my experience the best sounding of the conventional windows, is the Kaiser filter. This has a coefficient that you can tune, to optimise stop band attenuation against the transition bandwidth. Transition bandwidth (how fast the filter rolls off) is important subjectively - but so too is stop band attenuation (how well sampling images are suppressed), and there exists a subjectively optimum value for these two factors.

2880px-Window_function_and_frequency_response_-_Kaiser_(alpha_=_3).svg.png

From the Fourier transform of the window we can see that at bin 40 we have much better attenuation than the rectangular window. Also, there is no discontinuity - it is a visibly smooth function. But the number of coefficient values that are close to sinc is now very small. This will mean that the filter will not perfectly reconstruct transients, as only sinc will do this.

So I designed an M scaler filter and used a rectangular window and Kaiser window to create the coefficients - and then a filter was created in the Xilinx tools. Using my M scaler test rig I could program three different filters - rectangular sinc, Kaiser or WTA, then measure and listen to the alternatives.

Firstly I needed to check the measurements against the filter performance to check the implementation - and they agreed. The AP measurements of the three are shown, using random noise at 0dBFS as the stimulus:

The red is 1M rectangular sinc, green plot is WTA, and brown the Kaiser.

The rectangular sinc was quite a surprise, as the transition band (I use the frequency from corner to -100dB) is appallingly bad - some 6kHz - even though all the coefficients are sinc. The WTA is some 12Hz - that's 500 times more effective. So to get this filter to have the same transition performance as the 1M tap WTA, we would have to use 500 million taps, as doubling the tap length halves the transition band. Transition band is very important subjectively, so the outlook for this sounding good is not promising. Remember also that the objective is to have a sinc function filter - this needs to be the same (or close as possible) as sinc in the time domain, and same as sinc (or close as possible) in the frequency domain - that is a brick wall filter. Both aspects are vitally important. This clearly fails as a brick wall filter, and the poor suppression of sampling images will have a big subjective consequence.

The Kaiser measures better than the WTA - it's transition band is only 3Hz. But because almost all the coefficients are not sinc, the reconstruction of the timing of transients will be not as accurate as sinc. This will affect the sound quality. Kaiser has the almost identical frequency domain performance as ideal sinc, but has a time domain performance that looks nothing like ideal sinc. The WTA filter uniquely has both very close to ideal sinc in the frequency domain and very close to ideal sinc in the time domain.

Before talking about my listening tests, I ought to give a quick re-cap as to why transients and the timing of transients are important. What we hear is not the output from the ears, but the brain's processing of that ear data; audio is a reconstructed processed illusion created by the brain. We do not understand how the brain separates individual instruments out into discrete entities and create extremely accurate placement data on those entities. What we do know is that transients are part of the perceptual cues that the brain uses to construct the audio illusion - transients are used for the perception of bass pitch, locating instruments in space, and timbre perception. They also get used to separate individual instruments into discrete entities - I am sure correlated patterns from transients plays a part in this. So if the transient timing is being constantly affected (too early or too late, constantly changing) then our perception of space, timbre, instrument separation, and perception of bass pitch will get affected. In the past, when coming up with the WTA filter, I optimised it by using specific tracks to evaluate all these factors, and these tracks were used for the listening tests.

The first test was 1M rectangular sinc against 1M WTA. My notes are not pleasant to read:

Depth: flat as a pancake, soundstage wide and out of focus
Timbre variation: hard and metallic, sibilance on vocals. Poor variation
Instrument separation: poor; sounds distorted, loudest instrument dominates - listening fatigue. Moby un-listenable.
Double bass: soft and diffuse, not easy to follow pitch

I didn't bother listening to it against Dave's WTA filter - it was that bad. The measurements indicate from transition band POV that it is 500 times worse than WTA - my listening impressions would agree with that. It's not fit for purpose, both from a measurement and SQ point of view.

As to the 1M Kaiser against 1M WTA - this was clearly better than 1M rectangular sinc but:

Depth: flatter and instruments out of focus in space
Timbre variation: suppressed, voice slight edge
Instrument separation: little congested, not good focus, loud instrument dominates.
Double bass: bit soft, not very well defined
Tonal balance: slightly bright, some edge to vocals

I then used pass through on the test M scaler to listen to the Kaiser 1M against Dave's 164k WTA and Dave clearly won out.

So with 1M sinc we can put a number based on the transition band measured performance, and subjectively the number rings true - it really does sound 500 times worse. As to Kaiser, it's difficult to put a number on coefficients that are almost the same as sinc against the WTA. That number would be around 50 times less than the WTA, and that seems about right - the 1M Kaiser sounds similar to what I expect a 20k tap WTA filter would sound like.

So a word now about the WTA filter and how that was created. WTA is a compromise between ensuring as much of the coefficients are sinc (so we tend to get perfect transient timing reconstruction), but on the other hand avoid a rapid change in coefficients going from zero (you have run out of taps on the filter) up to full sinc. The WTA windowing function is unique and is aimed to maximise these compromises. After many years of experimentation and trials (starting in 1999) I ended up with an equation that calculated the coefficients. This has 5 variables - the first being tap length, which of course is ultimately hardware determined. On the other four variables, two are independent - listening tests confirms which one sounds best. One of these variables has to be set to 10 parts per million accuracy - that means a 10 PPM shift is clearly audible. The other two variables concern the shape of the function at the beginning (start at zero) and another controls the end (hitting 1 for the window function). These variables are interdependent, so you have to start with a rough set of both to determine which sounds best, then move to the next variable, then repeat until you can no longer hear a change in performance. To give you an idea of the accuracy needed for this - a 1% shift in the curve over 10% of the function is audible. The process of optimising the WTA equation took many weeks with probably thousands of listening tests (sometimes when the difference is really subtle you have to listen as many as 10 times on all 5 tracks to get a consensus as to which is best). But this effort pays off - as a subtle change in values can give the subjective equivalence of doubling the tap length. And increasing tap length ad infinitum is not a practicable solution - more processing costs more in FPGAs and power, and has longer latency (delay). Tweaking a coefficient just costs me my time.

So with 1M taps are we getting close to subjective convergence? Absolutely not. Indeed, I ended up being more convinced that algorithm is vastly more important than tap length. Having said that, I am certain that more musical benefits are to be had by further increasing the tap length.

Happy listening, Rob

PS - the excellent news that a Pfizer vaccine with 90% effectiveness means that CanJams should be starting again sometime next year. I have sorely missed these shows, so I am looking forward to seeing and talking to enthusiasts again.

Reactcore · Nov 11, 2020

Rob Watts said:
I ended up being more convinced that algorithm is vastly more important than tap length.

Does this mean you will be redesigning the WTA algoritm aside increasing tap lenths? And would a 'improved' WTA might fit in the current generation FPGA chips making the creation of a even better Chord DAC in a small housing possible?

miketlse · Nov 11, 2020 at 12:31 PM

Reactcore said:
Does this mean you will be redesigning the WTA algoritm aside increasing tap lenths? And would a 'improved' WTA might fit in the current generation FPGA chips making the creation of a even better Chord DAC in a small housing possible?

i sense it to mean that the WTA algorithm needs no change, but the values of the 5 variables are still under investigation.
Optimising those 5 variables, can take Rob some time, but maybe could further improve the final sound signature, without requiring millions of additional taps.

ecwl · Nov 12, 2020 at 9:21 AM

So given there are no advances in FPGA appropriate for Chord's DAC purposes, I was wondering if @Rob Watts would consider developing software upsampling solutions (e.g. for iPhone/iPad) given advancing power efficiencies and performance.
I presume most of Hugo 2 & Mojo performance are limited by thermal limits of the FPGA and the Pulse Array DAC. I wonder if it is possible to develop the upsampling software for say iOS/iPadOS to playback at 768/705kHz into a new generation of USB DACs whose FPGA would only do the second stage WTA upsampling and advance noise shaping with potentially more elements in the Pulse Array DAC as the FPGA would not have to run as hot in the Hugo 2/Mojo chassis. I guess the downside to this concept is that the DAC would then underperform if not partnered with an optimal software upsampler. So I realized the more I talk about it, how niche this product would be.
Anyway, I'm sure there are still many amazing products for @Rob Watts to work on such as pulse array amplifier and ADC.

jarnopp · Nov 12, 2020 at 9:28 AM

ecwl said:
So given there are no advances in FPGA appropriate for Chord's DAC purposes, I was wondering if @Rob Watts would consider developing software upsampling solutions (e.g. for iPhone/iPad) given advancing power efficiencies and performance.
I presume most of Hugo 2 & Mojo performance are limited by thermal limits of the FPGA and the Pulse Array DAC. I wonder if it is possible to develop the upsampling software for say iOS/iPadOS to playback at 768/705kHz into a new generation of USB DACs whose FPGA would only do the second stage WTA upsampling and advance noise shaping with potentially more elements in the Pulse Array DAC as the FPGA would not have to run as hot in the Hugo 2/Mojo chassis. I guess the downside to this concept is that the DAC would then underperform if not partnered with an optimal software upsampler. So I realized the more I talk about it, how niche this product would be.
Anyway, I'm sure there are still many amazing products for @Rob Watts to work on such as pulse array amplifier and ADC.

Given the success of Mojo, would it be possible and profitable to make a custome chip for Mojo2, so that you didn’t need a FPGA? I don’t know what the minimum sales would need to be for such development, but if you could get TT2 performance out of a custom chip, it could drive Mojo2, Qutest2 and Hugo3!

miketlse · Nov 12, 2020 at 10:58 AM

jarnopp said:
Given the success of Mojo, would it be possible and profitable to make a custome chip for Mojo2, so that you didn’t need a FPGA? I don’t know what the minimum sales would need to be for such development, but if you could get TT2 performance out of a custom chip, it could drive Mojo2, Qutest2 and Hugo3!

'Given the success of Mojo, would it be possible and profitable to make a custome chip for Mojo2, so that you didn’t need a FPGA?'
I suspect the answer is No.
From robs posts over the past few years, I think he has mentioned that custom chip fab plants, cost billions of dollars to build, and then need sales in the tens or hundreds of millions of chips, to become economic.
The benefits of FPGA is that although the individual chip costs more, you remove issues like:

one issue that plagued 'dac on chip' which was the ground plane was common, so once RFI entered the ground plane, it reduced the analog accuracy for all the analog processing
'dac on chip' allows the use of 'industry standard' approaches to DA conversion, but Robs approach is not 'industry standard', so the bespoke coding approach using a FPGA probably represents the cheapest option to date.

The most interesting question probably becomes - given that the roadmap to cramming all the code into one larger, more efficient, cheaper FPGA has stalled:

is this now time to explore making the code more modular, and distributing it across two or more FPGAs:
- one smaller FPGA for inputs handling, screen displays, etc
- one or two FPGA to perform the upscaling, pulse arrays etc

Rob Watts · Nov 12, 2020 at 11:54 PM

Given that I turned 60 recently, I am in the twilight years of my career. In many ways I am extremely fortunate, being fitter today than when I was 30; no age related issues at all; and experience and new knowledge (which is still growing daily) is more than compensating for cognitive decline. But this won't last for ever - so I now focus only on things that interest me, and projects that will have a genuine improvement in performance. And I have a huge number of these in the pipeline!

As too putting the WTA onto a PC - that process would degrade SQ, as an FPGA solution will always be less powerful, and being hardware one can mitigate the noise issues too. I would need to invest a large amount of time into such a project too - so that doesn't interest me.

A custom chip would also consume a huge amount of time, and it's difficult to see how a business case could be made for the audiophile market.

@miketlse is correct - there would be huge performance losses involved in certainly doing the analogue parts.

As too FPGAs for high end solutions - I have no problems in simply using more of them (like the DAC64 did - the design was partitioned into four FPGAs) or simply using a huge FPGA at silly costs. So there is no gate count restrictions at all in still using FPGAs. My desire for more performance from FPGAs is so that I can do a Hugo x that has a built in M scaler - and from power, and cost POV we are currently an order of magnitude away from that, with nothing on the horizon at the moment. FPGA companies seem to focus now on expensive data centre or SOC solutions, not commercial sector FPGAs. Currently, I have the capability to do anything I could possibly dream of doing from existing FPGAs if you are not too concerned about costs or power dissipation.

Triode User · Nov 13, 2020 at 2:24 AM

Rob Watts said:
Given that I turned 60 recently

Congratulations but you are a young whippersnapper, I was 65 this year but likewise am as fit as ever I was. I see friends retiring all the time but they do not seem any happier. I heard about the Japanese island of Okinawa where they do not have any words relating to retirement but they have the word ikigai which basically means reasons to get up in the morning and that is my own philosophy. It sounds as if it is yours as well and long may it continue.

alxw0w · Nov 13, 2020 at 2:51 AM

Rob is it safe to use speakers with minimal impedance of 4 Ω (nominal 6 Ω ) directly with the Dave ?
I was using them with Hugo TT2 with good results now it's time to test it with the Dave.

ZappaMan · Nov 13, 2020 at 3:19 AM

Neil young, "falling from above", sings:

"I wont retire, but I might retread".

check out Greendale(Live at Vicar Street) if you want to a hear an acoustic set, just Neil, his guitar and mouth organ and a warm irish audience...

Congratulations Rob, you've made it to the off-ramp lol.

Rob Watts · Nov 13, 2020 at 5:27 AM

alxw0w said:
Rob is it safe to use speakers with minimal impedance of 4 Ω (nominal 6 Ω ) directly with the Dave ?
I was using them with Hugo TT2 with good results now it's time to test it with the Dave.

Not really. 4 ohms is too much for Dave - unless your volume is at -6dB or below.

alxw0w · Nov 13, 2020 at 5:35 AM

So we are waiting (impatiently) for power pulse array amps

Reactcore · Nov 13, 2020

Hello Rob,

I guess i'm speaking for many fellow lovers of your work as i say i can only follow you partly while you explain how the WTA filter works.
Talking about sinc functions require deep understanding of mathematics, and while i am not a beginner, i often lose the picture while you explain things.

So i tried to visualize whats happening to the digital audio signal when u 'reconstruct' it using algoritms. and why this works better than other upsampling methods or analog filtering.
To do this i used a simple 15 second audio file of a 44.1K sampled Drum ruffle ending with a cymbal hit.

As you often pointed its the timing of transients of the starting/stopping of notes is where it get crucial for perceiving and distinction of instruments.
So the hit on the cymbal should be a good example to visualize it.

I loaded the file in the program Audacity (freeware) where i can zoom in to digital sample level of the transient of the hit.
The program draws a straight line between the samples. If i'm stating it correct, this would represent a signal without filtering.

It looks like the picture below:

I can easy understand that the infinite variating signal level in between the samples (dots) is lost digitalizing the sound but this information is extremely important.
It is where the transients (edge of the start of the rising or falling slope of the signal level) mostly happen. It is rare that it is timed precisely on a sample.

Ok so this last picture i took over in autocad as a drawing.

If this file simply is upsampled to a higher frequency, it would make no change to the sound of it because the timing of the transients isn't recovered.
I know many of these 'straight' upsampled files excist, it can also be done with the Audacity program.

Under i visualized a upsampling to 176.4K of this part of the signal:

The squares are simply added samples on a straight line between the rising or falling edges of the original samples. this makes the file size bigger but it will give no benefit to the sound quality.

Now with the WTA filter algoritm you managed to 'look' to the past and future of the trajectory of the signal taking a multible of original samples into account while processing it.
In this way it can be 'predicted' where the lost edges of the rising or falling slopes are. Between the original samples on this trajectory you project Taps which in the end formulate the output samples of the DAC.

So i tried to visualize this on the drawing as shown under:

Between 2 samples it can only be drawn a straight line in my case.
Now it is created where the transient should occur. Also i can see the timing error (in red)

In this simple picture i still see timing errors (between the apparent transient en the nearest tap or 'new sample' but they are smaller. With more taps the output will be closer to the ideal.

Well this is how i interpreted and think to understand the basic working of the WTA filter and that it is so different from other filtering methods of upsamplings.
Perhaps you can tell if i'm somewhere in the ball park or completely missed it. If the latter is the case i might remove this post.

Thanks for reading,
Rick

Rob Watts · Nov 14, 2020 at 10:40 AM

Thanks for that Rick.

Eyeballing it and using adaptive linear interpolation (drawing straight lines) may be the correct approach in this instance; or it may be more accurate if you more rounded the corners! We simply don't know. The beauty of a sinc filter with infinite taps is that this will guarantee the correct intermediate values - so long as the signal was perfectly bandwidth limited before sampling.

The WTA isn't really any different to other filters - except that for a given tap length it will create the intermediate samples to orders of magnitude more accuracy, and doubling the tap length will double the accuracy. One day I hope I will get it so that it is accurate enough, in that doubling will result in no or insignificant improvement in sound quality - but we are not there yet.

dakanao · Nov 14, 2020 at 12:16 PM

Rob Watts said:
Thanks for that Rick.

Eyeballing it and using adaptive linear interpolation (drawing straight lines) may be the correct approach in this instance; or it may be more accurate if you more rounded the corners! We simply don't know. The beauty of a sinc filter with infinite taps is that this will guarantee the correct intermediate values - so long as the signal was perfectly bandwidth limited before sampling.

The WTA isn't really any different to other filters - except that for a given tap length it will create the intermediate samples to orders of magnitude more accuracy, and doubling the tap length will double the accuracy. One day I hope I will get it so that it is accurate enough, in that doubling will result in no or insignificant improvement in sound quality - but we are not there yet.

Hi Rob, I've send you a pm about a question of IEMs when used with the Chord Mojo that I'm trying to figure out

Latest Thread Images

Rob Watts

Member of the Trade: Chord Electronics

Reactcore

1000+ Head-Fier

miketlse

Headphoneus Supremus

Attachments

ecwl

500+ Head-Fier

jarnopp

Headphoneus Supremus

miketlse

Headphoneus Supremus

Rob Watts

Member of the Trade: Chord Electronics

Triode User

Member of the Trade: WAVE High Fidelity

alxw0w

1000+ Head-Fier

ZappaMan

Headphoneus Supremus

Rob Watts

Member of the Trade: Chord Electronics

alxw0w

1000+ Head-Fier

Reactcore

1000+ Head-Fier

Rob Watts

Member of the Trade: Chord Electronics

dakanao

Headphoneus Supremus

Users who are viewing this thread