To crossfeed or not to crossfeed? That is the question...
Dec 10, 2017 at 8:42 AM Post #421 of 2,146
[1] Processing in a higher sample/bit rate format helps to (a) minimize aliasing, (b) quantization noise, rounding-off errors, (c) phase misalignment issues, etc.
[2] The more data points an algorithm has to work with, the more precise and accurate the result of calculation will be.
[2a] That's why high-quality plugins offer oversampling options to increase the accuracy of processing. [2b] And that's why DAW process signals at higher sample and bit rates. They need more resolution to achieve their best.
[3] Processing audio means running thousands of mathematical calculations. Where the results of one calculation depend on the results of a previous calculation.
A higher sample/bit rate is like having a calculator with more figures after the decimal point. The more figures there are after the decimal point, the less the rounding off error is. And if there are thousands of math calculations to do, these rounding off errors add to each other, resulting in a less precise result.
Consider this:
If a calculator (A) has only two figures after the decimal point, it cannot multiply 1.25343 x 1.54789 as precisely as a calculator (B) with four figures after the decimal point:
Reality: 1.25343 x 1.54789 = 1.9401717627
Calculator A: 1.25 x 1.55 = 1.94 (the rounding-off error is 1.9401717627 - 1.94 = 0.0001717627)
Calculator B: 1.2534 x 1.5479 = 1,9401 ((the rounding-off error is 1.9401717627 - 1.9401 = 0,0000717627.
The rounding-off error of calculator A is 2.5 times more than the rounding-off error of calculator B.
The more math operations we do, the more we deviate from the precise result, because rounding-off errors multiply.
Now imagine doing thousands of such operations in one plugin, then passing the result to another plugin which runs thousands of other math calculations.
How about a chain of 5-6 plugins?
[4] That's why I prefer to upsample the signal to as high format as my DAC would accept. (44/16 > 176/24).
[5] Please refer to Chapter 4 "Worldlengths and differ" (page 49).
Also, Chapter 18 "High Sample Rates" (page 221).

1a. Aliasing of what? There is nothing above 22.05kHz if you're feeding 16/44.1, upsampling does NOT magically recreate those frequencies ALREADY removed above the Nyquist point. Same with bit depth: If we've got a 16bit file and convert it to 24bits it does NOT magically generate data for those extra 8 bits, all it does is fill/pad those 8 bits with zeros!
1b. The quantisation/round-off error is ALWAYS in the LSB of the plugin format, which is 64bit float in many cases, 32bit float in others.
1c. Sample rate has nothing to do with phase.

2. That's of course nonsense! How does 1.25343 x 1.54789 give a less accurate result than 1.2534300000000 x 1.54789?
2a. Due to the above, your assertion obviously has nothing to do with why plugins upsample! There are 3 potential reasons plugins upsample: 1. The plugin is using some non-linear process which generates content above 22.05kHz, typically something like an analogue modelled compressor will do this to generate IMD in the audible band. 2. It might be more practical for a plugin to operate at a single sample rate and up/down sample it's input to match, some convolution reverbs do this for example. 3. It could be purely marketing, to fool newbs who are gullible enough to believe that a higher sample rate must be better because it's a bigger number!
2b. DAWs do not operate at a higher sample rate! If you record in 44.1kHz, they operate at 44.1kHz. Their internal mix environment is commonly 64bit float, some are 32bit float or in some older DAWs it's 48bit fixed.

3. All of this is irrelevant nonsense!! Let's take your 1.24343 as our 16bit value, let's convert it to 24bit, so now we have something like 1.2434300000000. What happens if we were to feed those two values into a 64bit plugin? Our 16bit value gets padded with a whole bunch of zeros to create a 64bit word so that our 64bit plugin can actually process it, so now we have:
1.253430000000000... On the other hand, our 24bit word gets padded with a whole bunch of zeros to create a 64bit word so that our 64bit plugin can actually process it, so now we have:
1.253430000000000... In both cases we've got 1.25343 followed by exactly the same number of zeros, so WHAT'S THE DIFFERENCE??
The result of all the internal calculations of the plugin is also a 64bit float (because it's a 64bit plugin!). The quantisation error is in the LSB of that 64bit result (because it's a 64bit plugin!). The output of the plugin when it's finished all it calculations is also a 64bit float (because it's a 64bit plugin!), which either stays as a 64bit float if the data paths between plugins is 64bit or gets truncated to 32bit if that's the width of data path.
The difference between a 16bit word or a 16bit word padded to 24bits is LITERALLY zero (or 8 zeros if you want to be really precise about it) and once input into a 64bit plugin even the number of trailing zeros is the same!!! The only way your examples and statements would make any sense would be if feeding a 16bit word to a 64bit plugin somehow magically changed all the plugin's internal coding/processing and turned it into a 16bit plugin, while feeding it a 24bit word magically turned it into a 24bit plugin. That's of course nonsense, all that happens is that those 16 or 24 bit words are padded with zeros to 64bit floats and that 64bit plugin is always a 64bit plugin!

4. What's your DAC go to do with it? You are talking about the precision of plugin processing not whether or not your DAC is incompetently designed, which is a completely different issue!

5. Ah, it seems like the suggestion in my previous post was incorrect. Instead, try a book which explains the very basics of digital first, and then you might correctly understand what's in Bob's book!

G
 
Last edited:
Dec 10, 2017 at 12:16 PM Post #422 of 2,146
1. So now you're agreeing with what I posted several pages back, yes, we have to compare the pro and cons for ourselves!
2. This is the sort of statement I object to! Just because you personally can't hear the cons does not make them "theoretical", you've just completely made that up, based solely on your own personal inability to hear them! If I were an old man and couldn't hear anything beyond 8kHz does that mean all frequencies above 8kHz are only "theoretical" or that they are real but I simply can't "recognise" them? Again, going round in circles back to where I started with you: I do hear the "cons", crossfed HPs do not sound perfect to me, un-crossfed HPs are also not perfect, even HRTF compensated HPs are also often imperfect, as is binaural and so is playback on speakers in a consumer environment. There is no perfect consumer playback!

G
1. Yes, agreed.
2. Yes, there is no perfect consumer playback, but in respect of just enjoying music it's pretty easy to have a perfect enough system these days. Without crossfeed isn't perfect, with crossfeed isn't perfect and speakers aren't perfect. I can only compare these three options to each other. To my ears headphones without crossfeed loses to headphones with crossfeed and speaker unless the spatial distortion of the recording is near zero or zero, but that happens rarely. If I had the chance to listen to the recording in the studio it was mixed in perhaps I would hear what is wrong with crossfed version, but I don't have that chance. Very few does. Most propably the studio doesn't even exist anymore as it was when the album was produced. Maybe they renovated it to improve the acoustics and upgraded speakers to the newest Genelec models? So, I have those three options and whether or not you like it heaphones without crossfeed loses hands down most of the time no matter what cons crossfeed has to you. Maybe I should emphasize that my opinions about crossfeed do not necessarily apply to überhumans with spatial hearing of elephants.
 
Dec 10, 2017 at 12:43 PM Post #423 of 2,146
I studied design in college and one of the things they taught me was to be true to your materials. Don't use wood grain formica or force a square peg in a round hole. Use your tools to their strengths. Trying to make headphones more like speakers is like that. If you are going to use headphones, think about what it is that headphones do better than speakers and play to that. If you want a sound that is more like speakers, use speakers. The same is true in reverse. A room without any acoustic character might sound closer to headphones, but as speakers, it would sound pretty lousy.

There is a lot of relevance in this comment. I like some crossfeed, I think it reduces fatigue by making things sound more natural acoustically. However, when people start talking about smyth realizers, room size algorithms, HRTF, etc. I have to admit that there is a part of me that becomes reactionary. It reminds me of VR, and I see a direct effect from gaming technology making its way into music listening through devices such as the SmythRealizer. In gaming the emphasis on spatial perception is important for either immersion or competition. Many of the headphone advancments like HRTF functions or 3d tracking came from that field. I think they are innovative solutions, but I do not understand why it is being assumed that an ideal headphone experience is a mimic of speakers, and that in order to experience "reference" sound on headphones first we must artificially simulate the interaction of sound in space that was never really there. It is precisely this lack of space that gives headphones their own character, for both its positives and negatives. You get a presentation isolated from ambient/acoustic effects, which can be beneficial. It's an intimate and detailed presentation. When that gets traded for a sense of synthetic space, like wearing a trackIR in a 3d game, the whole experience seems less authentic to me. Neither headphones or speakers, but a Frankenstein between. For me, it not only fails to suspend my disbelief, it puts me right in the uncanny valley!

I still maintain that I like some crossfeed, but I don't know how I feel about replicating speakers to the full extent with HRTFs, head trackers, binaural mics, etc. That is my personal preference though, and if others wish to pursue this technology I encourage them and will stay informed of their progress. I do not, however, think that should become the standard of headphone listening.
 
Dec 10, 2017 at 12:47 PM Post #424 of 2,146
When making music I "render" the raw tracks and often fist crossfeed and then add reverberation. The "direct sound" is strongly crossfed while reverberation contains greater ILD. Summing these together ensures, that the ILD levels stay low enough. Works nicely imo.

So let me get this straight. You crossfeed to correct the positional distortion but the spatial distortion, which you've been going on about for pages, you don't crossfeed in your own mixes, why is that? Why don't you apply crossfeed after you've added reverb and correct all that horrible, unacceptable spatial distortion? And, how do you know that in your opinion it "works nicely", are you an uberhuman with the spatial hearing of elephants?

Much of what you've said previously seems like nonsense and now that your contradicting your own arguments against the "spatial information/distortion", I can't see how it even makes sense to you!

Maybe I should emphasize that my opinions about crossfeed do not necessarily apply to überhumans with spatial hearing of elephants.

G
 
Last edited:
Dec 10, 2017 at 1:38 PM Post #425 of 2,146
There is a lot of relevance in this comment. I like some crossfeed, I think it reduces fatigue by making things sound more natural acoustically. However, when people start talking about smyth realizers, room size algorithms, HRTF, etc. I have to admit that there is a part of me that becomes reactionary. (...), but I do not understand why it is being assumed that an ideal headphone experience is a mimic of speakers, and that in order to experience "reference" sound on headphones first we must artificially simulate the interaction of sound in space that was never really there. It is precisely this lack of space that gives headphones their own character, for both its positives and negatives. You get a presentation isolated from ambient/acoustic effects, which can be beneficial. It's an intimate and detailed presentation. (...)

I still maintain that I like some crossfeed, but I don't know how I feel about replicating speakers to the full extent with HRTFs, head trackers, binaural mics, etc. That is my personal preference though, and if others wish to pursue this technology I encourage them and will stay informed of their progress. I do not, however, think that should become the standard of headphone listening.

It is true that some prefer headphones over speakers.

I've never had a videogame and I don't use any regularly. But I find modeling the human auditory perception important for other uses that are not related to the entertainment industry (I could elaborate on this, but it is off topic).

There is a difference between convolving your own personal binaural room impulse response (PRIR, currently the state of the art) and convolving your own high resolution HRTF (the acquisition is far from trivial and is not mainstream).

If you distribute ambisonics you just need to downmix to stereo instead of downmixing to binaural with a generic HRTF. You still get what you prefer.

If you choose in the future the convolution of a high resolution HRTF you get what you desire plus elevation. You will maybe find elevation attractive.

I guess is also true the majority in the future will choose binaural through two loudspeakers or beamforming phased array of transducers, because consumers find inconvenient multiple surround speakers or wearing headphones.

P.s.: the Realiser lets you choose the reverberation window of any PRIR if you want critical monitoring. I believe it was designed with the pro audio industry in mind much more than the gaming portion of the entertainment industry.
 
Last edited:
Dec 10, 2017 at 1:38 PM Post #426 of 2,146
So let me get this straight. You crossfeed to correct the positional distortion but the spatial distortion, which you've been going on about for pages, you don't crossfeed in your own mixes, why is that? Why don't you apply crossfeed after you've added reverb and correct all that horrible, unacceptable spatial distortion?
This is "rendering" individual tracks which can have spatial distortion, because only the whole mix containing all the tracks must be free of spatial disortion. I try to accomplish omnistereophonic sound which works for speakers and headphones without crossfeed. If I crossfeed all tracks individually to have no spatial distortion, the whole mix is going to be probably too monophonic. It must be kept as wide as possible for speakers. Reverberation after crossfeed (means a plugin written by me to use ILD/ITD processing) keeps width for individual tracks and also sounds good. The final mix can always be crossfed a little if some spatial distortion remains. I have written plugins to reduce ILD below 315 Hz and to increase it above 1600 Hz so it's easy to optimaze ILD.
 
Dec 10, 2017 at 1:49 PM Post #427 of 2,146
I thought it might be a nice idea to see who likes or dislikes crossfeed.

Please vote and share your opinion either way. I wanna hear what people here have to say about it one way or another.
I vote for crossfeed. My experience from best to mild. #1 Holographic Audio Ear One with Ohman X-FEED, Beyerdynamic Headzone with headtracker, McIntosh Headphone Crossfeed Director HXD, Headroom Home amp. HD 800 needs crossfeed to sound natural.
 
Dec 10, 2017 at 2:26 PM Post #428 of 2,146
I strongly believe that digital signal processing is the best way to improve sound quality today. We've gotten to the point where purity theories are obsolete. We can go to Walmart and buy a player for under $50 that sounds perfect to human ears. More money and better specs don't improve sound any more. Perfect reproduction is perfect. That means that if we want to improve sound, the best way to do that is to be able to sculpt it. Cross feed is a very basic way of doing that. I think in the future, there will be much more sophisticated ways to solve the problems cross feed is intended to help correct.
 
Dec 10, 2017 at 3:27 PM Post #429 of 2,146
It is true that some prefer headphones over speakers.

Yes, it is. I'm not necessarily one of those people (I value bass and unencumbered movement too much) but I can see how it could be for some.

I guess is also true the majority in the future will choose binaural through two loudspeakers or beamforming phased array of transducers, because consumers find inconvenient multiple surround speakers or wearing headphones.

P.s.: the Realiser let's you choose the reverberation window of any PRIR if you want critical monitoring. I believe it was designed with the pro audio industry in mind much more than the gaming portion of the entertainment industry.

I wish consumers would suck it up a little, and buy big, ugly speakers. More spending power in that market would be beneficial for all. The consumer resistance toward big speakers led to the creation of satellite systems and sound bars, and proliferation of lifestyle systems by companies like Bose. I have seen virtualization technology used to fairly good effect on TV-embedded sound bars, but nothing even close to decent set of speakers. Headphones are growing in popularity, and I see more opportunity to expand the hifi market there than anywhere else.

The reduction of PRIR for critical monitoring reminds me of the Dolby Headphone DH-1 setting for "reference room". There is a tacit admittance that a critical monitoring environment has the least amount of acoustic response possible. It is a treated room. The question is how treated? How large? And deciding these factors will be a somewhat arbitrary process. Are you going for Abby Road or Gateway? This also brings up the fact that making a studio is an art within itself. All critical monitoring environments are not exact replicas of each other. So the question becomes "which environment do you choose to mimic and why?". And that's not a simple question to answer.

I'm also a bit apprehensive about using simulation algorithms in a critical environment. You are strictly relying upon Smyth's proprietary coding to simulate phase, and I'm not convinced their algorithm is a direct replacement for reality. It reminds me of how auto companies design and test cars in simulated physics environments in CAD. Yes, those simulations are very accurate, and get better year after year, but they are not perfect reflections of real world Newtonian physics. At some point, prototypes need to hit the track or be crashed into walls to see how they react to a truly physical world. I feel the same way about sound waves in space.
 
Dec 10, 2017 at 3:35 PM Post #430 of 2,146
I strongly believe that digital signal processing is the best way to improve sound quality today. We've gotten to the point where purity theories are obsolete. We can go to Walmart and buy a player for under $50 that sounds perfect to human ears. More money and better specs don't improve sound any more. Perfect reproduction is perfect. That means that if we want to improve sound, the best way to do that is to be able to sculpt it. Cross feed is a very basic way of doing that. I think in the future, there will be much more sophisticated ways to solve the problems cross feed is intended to help correct.

I totally agree. Fidelity is so 1980s. We're way beyond that now. The question is how do we proceed? I think conversations like this are great. Real, pragmatic conversations about how to improve music. And not even spending that much money to do it! Terms like "DSP rolling" and "DSP chain" I hope will become a more typical feature of the language here, and we can encourage that greatly. It's also important that we realize that one DSP chain or setting is not objectively superior to another. This pragmatic conversation needs to take place openly, and we must value each other's tastes and perceptions as much as our own. There is no "correct way" there is just "a way". If we can do that, I see us helping a lot of people here to get better sound, and learning more to achieve better sound ourselves.
 
Dec 10, 2017 at 4:29 PM Post #432 of 2,146
The reduction of PRIR for critical monitoring reminds me of the Dolby Headphone DH-1 setting for "reference room". There is a tacit admittance that a critical monitoring environment has the least amount of acoustic response possible. It is a treated room. The question is how treated? How large? And deciding these factors will be a somewhat arbitrary process. Are you going for Abby Road or Gateway? This also brings up the fact that making a studio is an art within itself. All critical monitoring environments are not exact replicas of each other. So the question becomes "which environment do you choose to mimic and why?". And that's not a simple question to answer.

I agree completely and that’s why I think future acquisition of HRTF with biometrics has an edge over PRIR’s. Anyway, Smyth Research Realiser PRIR’s cannot be compared with Dolby Headphone DH-1 at all because the former are literally that (“personal binaural impulse responses”) while the latter relied on generic BRIR/HRIR. Chances are the lower adoption of Dolby Headphone DH-1 had more to do with the lack of personalization than with the choice of mastering rooms in which the generic BRIR/HRIR were acquired.

I'm also a bit apprehensive about using simulation algorithms in a critical environment. You are strictly relying upon Smyth's proprietary coding to simulate phase, and I'm not convinced their algorithm is a direct replacement for reality. It reminds me of how auto companies design and test cars in simulated physics environments in CAD. Yes, those simulations are very accurate, and get better year after year, but they are not perfect reflections of real world Newtonian physics. At some point, prototypes need to hit the track or be crashed into walls to see how they react to a truly physical world. I feel the same way about sound waves in space.

You’ve got me there, particularly having in mind the efficiency of the interpolation algorithm. You have to hear it by yourself. Nevertheless, here is what Professor Smyth says about you apprehension:

SMYTH SVS
HEADPHONE SURROUND MONITORING FOR STUDIOS
PRIR look-angles
SVS simplifies the personalisation process by acquiring a sparse set of PRIR measurements for each active loudspeaker. Typically the system measures these responses for three different head positions, at approximately -30º, 0º and +30º azimuthal angle.
(...)
The three positions chosen allow simple rotational head- tracking to be accomplished by interpolation between the binaural data sets from each head position – for example, within the scope of the left and right speakers. This is a necessary but reasonable compromise. For critical listening the only viable monitoring position is looking straight ahead at the centre speaker, and thus is accurately virtualized using the SVS methodology. Head-tracking induced interpolation is only engaged when the user's head moves off centre.
(...)
PRIR data: are three positions enough?
The three positional PRIR data sets, typically used by the SVS system, allow restricted head movements around the central monitoring position, sufficient to maintain the authenticity of the virtualisation. Nevertheless, interpolating between PRIRs does introduce some degree of inaccuracy. However experimental evidence [7] has shown that interpolation between two individualised HRTFs with an azimuthal separation of up to 30º does not introduce perceptible errors. Where SVS is used to virtualise 5.1ch loudspeaker arrangements, the PRIR separation is typically 30º.
It should also be noted that any inaccuracy introduced is mitigated by two factors. First, the normal monitoring position is looking straight at the centre speaker, and here the interpolation distance is negligible. Therefore the PRIR data used for virtualisation during critical listening is almost identical to the measured data. Taking this a step further, the user can opt to temporarily disable the head-tracking, thereby completely removing inaccuracies introduced by PRIR interpolation.
(...)
[7] Martin, R. and McAnally, K. 2007. "Interpolation of Head-Related Transfer Functions", Australian Government, DSTO-RR-0323
http://www.smyth-research.com/articles_files/SVSAES.pdf
 
Last edited:
Dec 10, 2017 at 4:35 PM Post #433 of 2,146
I'm betting that multichannel speaker systems will become more prevalent. New houses will most likely be built with media rooms that have built in sound systems and media servers that feed the whole house the same way that houses have electrical and plumbing. If you look at the layout of the typical house, it's changing. Separate dining rooms and living rooms are giving way to open floor plans that combine kitchen, dining and living room areas all into one. Home offices are also being designed into floor plans now. It's just a single step further to design those areas to incorporate networking and places designed specifically as a spot for the big screen TV along with multichannel audio and basic room treatment built right into the walls.
 
Dec 10, 2017 at 4:44 PM Post #434 of 2,146
Would be great to have a DSP Chain & DSP Rolling How-To thread

Ok, I think I'll start one up. I think I'd enjoy maintaining a thread like that, but I'll need help from different people on different platforms and players. I could update the original post with links to their posts or other threads. I will type up some kind of an intro to start with, and maybe a few basic links. It'd be nice to have a single repository of reviews, links, and help. Nobody can find all the DSPs out there on their own. Too many.

@catleofargh, is it possible to edit an original post indefinitely or do editing capabilities get locked out after a certain point?
 
Last edited:
Dec 10, 2017 at 4:51 PM Post #435 of 2,146
Last edited:

Users who are viewing this thread

Back
Top