Recording Impulse Responses for Speaker Virtualization

Discussion in 'Sound Science' started by jaakkopasanen, Oct 9, 2018 at 10:53 AM.
  1. jaakkopasanen
    Speakers can be virtualized (simulated) with headphones very convincingly with impulse responses and convolution software. This however requires the impulse responses to be measured for the individual listener and headphones. I'm trying to achieve this.

    I made impulse response recordings by playing sine sweep on left and right speaker separately and measuring it with two ear canal blocking microphones. I turned these sweep recordings into impulse responses with Voxengo deconvolver software. I also measured my headphones the same way and compensated their frequency response with an EQ by inverting the frequency response as heard by the same microphones. Impulse responses are quite good and certainly better than any other out of the box impulse response I have ever heard. However they are suffering a little by coarseness, sound signature is a bit bright and sound localization is a tad fuzzy.

    When listening on headphones a music recording which was recorded with the mics in ears while playing the music on speakers the result is practically indistinguishable from actually listening to speakers. My impulse responses and convolution come close but still leaves me wanting for better. I think the main problem might be the noise introduced by my motherboard mic input.

    I thought about using digital voice recorder like Zoom H1n for the job. This model can do overdub recordings with zero delay between the playback and recording making it possible even to record each speaker separately. I'm also assuming that the mic input on this thing is quite a bit better than my PCs motherboard.

    Does is seem like sensible idea to use voice recorder and are there better options? Can you think of other sources of error than the noise from the mic input? Should I do some digital noise filtering on the sine sweep recording before running the deconvolution? Any other ideas for improving the impulse responses?
  2. 71 dB
    What about phase? Phase/time delay between ears? How long sweep do you use? Doubling the sweep duration theoretically increases signal-to-noise ratio by 3 dB. Is the system linear enough? Increasing level of measured signal increases signal-to-noise ratio too, but easily introduces more distortion (loudpeakers!). Broadband noise can be filtered out of the response using sweeping band-pass filter that follows the frequency of the sweep. If this filtering is done in "filtfilt" style (first normally and then again for the reverved response), no additional phase shifts are introdused. The filter should be asymmetrical, steep for higher frequencies, because you don't expect higher than f frequencies when measuring frequency f, but you do expect lower frequencies, because they are attenuating away.

    Tell that to bigshot. :beyersmile:
  3. gregorio
    There are a number of issues and potential issues with what you're trying to achieve, most of which are related to the fact that the impulse response you're recording largely isn't an impulse of your speakers. It's an impulse response of your room (acoustics) and of your recording chain at least as much as it is of your speakers and additionally, it's a response of your "impulse". In no order of preference/importance:

    1. Your room acoustics: Human perception works on the basic principle of actually discarding/ignoring most of the sensory input we receive and combining what's left, along with prior experience/knowledge, to create a "perception". There's simply too much sensory data for our brain to process in any reasonable amount of time and evolution has come up with this method as the most practical way of making sense of the world around us. In effect, our perception is an educated guess/judgement of what's really occurring, rather than an accurate representation of it and music (and pretty much all art) absolutely depends on this difference between reality and perception. In other words, certain aspects of what our senses are telling us can be (and usually are) changed somewhat by our brain, in order to make sense of it all. Hence why optical and aural illusions exist and why two different witnesses to an event can truthfully describe that event significantly differently. In the particular scenario you're talking about we have a sensory conflict, the brain will typically change (the perception of) some of that sensory input to remove the conflict and make sense of it. Let's take the example of a recording of a symphony; when you play that recording back on your speakers, what you're hearing is the acoustic space of a large concert hall but your knowledge and eyes are telling you that you're in (for example) your sitting room, we have a sensory conflict. The music producer and engineers compensate for this as much as possible (as they too are creating the recording/mix/master in acoustic spaces which are significantly different to a concert hall) but nevertheless, there's still somewhat of a conflict which the brain will try to resolve. So even with a theoretically perfect recording, perfect speakers and perfect home acoustics, the reproduced recording is never going to sound the same as the original performance in the concert hall, although it might be close enough to fool some/many/most people. In addition, what you're attempting to achieve is a faithful reproduction of your speakers/room in a different room/environment, an additional sensory conflict. In other words, even if it were possible to achieve a perfect impulse response and convolution, when listening to your symphony recording on your headphones, you're effectively hearing a concert hall in your sitting room while your knowledge and eyes are telling your brain that you're actually in (say) a bus! How convincing is that going to be? Maybe almost completely convincing to you personally, but who knows. It might be interesting to see if it's more convincing listening on your convolved headphones when you're actually in your sitting room (or whatever room your speakers are in) than when listening in a significantly different environment. I assume it would be more convincing but whether that makes enough of a difference to you personally I obviously can't say.

    2. Your recording chain: Microphones, being transducers, are relatively inaccurate. Measurement mics are the most accurate as far as freq response is concerned but unless you buy very expensive ones, even measurement mics are still relatively inaccurate. A more favoured solution these days is to buy cheaper measurement mics, have a "correction file" created by a calibration lab for each mic and software which allows you to apply them. However this is not a perfect solution and additionally, measurement mics typically gain their frequency accuracy at the expensive of a lot more self noise, which is why measurement mics are never used in studios for recording music. Music mics have far less self noise but are typically far more inaccurate, each brand/model of mic has it's own "colouration" which is desirable for commercial music/audio recording but not when what you're specifically trying to record is the "colouration" itself, of a different transducer (your speakers)! There's also the issue of "off-axis" mic response. Then there's the rest of the chain, the mic pre-amps and noise introduced by say your computer/motherboard. The Zoom H1n should have little/no motherboard noise but it does have rather poor mic pre-amps. It's effectively cheap consumer grade electronics, which is OK for a quick, dirty record of an event but a long way from higher-end pro units. Of course all of this is relative, if your recordings suffer from a great deal of motherboard noise then the H1n could be a considerable improvement.

    3. Your impulse: Does a sine sweep fully characterise your speakers? How do your speakers respond to sharp, loud transients rather than a continuous sine wave?

    The things I've mentioned above can each be fairly insignificant on their own or quite noticeable, depending on what equipment you've got and your personal perception. Additionally, even if they are relatively insignificant on their own, the cumulation of them might not be.

  4. Speedskater
    To expand on the above great reply:
    a sine-sweep test will only tell you about the sustain response of the system, which is mostly room response. This type of test tells you little about the transit or impulse of the speakers (or direct response).
  5. jaakkopasanen
    Not sure what you mean by those phase and time delay between ears questions. They should be mapped correctly by having the mics in ears, no? Sweeping bandpass filter is probably just the thing I was looking for. I think Smyth Realizer A16 does this because it sounds like the sweeps from different channels are overlapping some. Controlling the bandpass filter steepness (bass part of it) would allow control of reverberation time if I haven't understood things wrong. It might be possible to have better room acoustics in the impulse response than in the real room. Thanks for the hint!

    1. I've noticed this myself. Listening to a PRIR with speakers far away is quite a weird experience when sitting close to a computer monitor. Brains don't really know how to reconcile the auditory cue for distant sounds and visual cue for near sounds. Works a lot better when both match. This could work the other way around too by making the impulse response sound better than what it actually is if it's recorded in the exact same spot as the listener sits when listening to headphones. I recorded impulse response from my own speakers sitting in my regular spot so it's quite easy for my brain to believe that what it's hearing is actually the real deal because my brains have been conditioned for some time now for this environment having that sound.

    2. If I'm not wrong the frequency response of the microphones doesn't really matter. I'm recording frequency response of my headphones with the same mics in ears and whatever that result is it also contains the frequency response of the mics. So when I compensate for the headphone frequency response with EQ I'm actually compensating for the mics' response too. Mic preamps on my motherboard are probably beyond abhorrent. Zoom H1n isn't known for it's mic pre-amps but should be significantly better than motherboard. Anything is better than motherboard really. If Zoom H1n mic pre-amps are not sufficient I will try separate pre-amps and feed the signal into the recorder by line input.

    3. Maybe it's good to make clear that I'm not actually trying to imitate speakers perfectly. The goal is to have realistic audio reproduction with headphones.

    If I understand correctly this would actually be a good thing. I don't really want speakers' transient response in there messing with my music experience. I think one could have significantly better transient response performance with headphones than with speakers (at least when considering price) so this speaker/room virtualization could sound better than the recorded speakers.

    I'm also thinking that it might be possible to do better room correction for the virtual room being simulated than for the actual physical room. Room correction can only go so far because not all acoustic phenomena are easy to handle just with DSP but since headphones don't have that problem (standing waves etc) it could just be so that the impulse response can be edited to have better room acoustics than what would be normally possible.
    Last edited: Oct 13, 2018 at 12:31 PM
  6. 71 dB
    1. Yes. I'm not sure what I meant when asking it… :face_palm:
    2. Hopefully it helps…
    3. Yes. Logarithmic sweeps greats a response of "natural" shorter reverberation time whereas linear sweep creates "unnaturally" decaying shorter reverberation (faster initial decay and then slower decay).
  7. bigshot
    Can you simulate 5.1, 7.1 or Atmos?
  8. RRod
    I use my Roland R-05 all the time for this task; works just fine. As 71dB noted, you can both extend the sweep and up your speaker volume to get better SNR. After deconvolution you will see miniature IRs before the main IR that correspond to the orders of harmonic distortion; these can be windowed-off to get to the linear part of the decomposition. As far as errors, one of the big ones can be binaural mic placement, so it's good to do several sweeps and pick one that seems reasonable. You might check out the Aurora plugins, made by a researcher who is big into this kind of thing. After it's all said and done you won't get something that sounds perfect. For me, just having something clamping on my head seems to prevent a real sense of sitting in front of speakers, and you won't have head movement accounted for. But satisfactory results don't take a huge amount of effort.
  9. castleofargh Contributor
    on the SNR idea, it can be interesting to have some notions of which loudness levels the mic can handle without distorting much. I've ruined a bunch of recordings/measurements myself trying to get the very best SNR possible without thinking of checking for distortions.

Share This Page