New stereo->"binaural" demo!

Apr 21, 2008 at 1:26 AM Thread Starter Post #1 of 18

Tabi

New Head-Fier
Joined
Feb 9, 2006
Posts
28
Likes
0
Please take a minute to listen to the real time stereo->"binaural" setup I created using VSThost 1.45, free VST plugins, stereo impulse responses and lots of experimentation. This effect could be used with Foobar2000 0.9, as there is a Stereo Convolver plugin available for it.
I hope you'll be amazed.
smily_headphones1.gif


YouTube - Virtual room demo (Stereo - use headphones!)
NEW: Robert Schumann - Symphony No.2 In C Major, Op.61: Scherzo (Allegro Vivace) (56 MB FLAC)

Some info:
The room stereo impulse responses I use in the setup are from Dolby Headphone reference/live setting (they're actually the same, "live" having more of the room mixed in), discarding the initial speaker response, but keeping the room response. Impulse responses taken from reverb units or made with free VST reverb plugins can also be used, they just need to be copied for both left and right channels, and panned slightly off-center for each channel.

The setup uses crossfeed instead of HRTFs, because I do not like how they affect the sound quality. I compensated for the bass/low-mid-heaviness of the crossfeed with some EQ, so it sounds pretty good.
 
Apr 21, 2008 at 1:58 AM Post #2 of 18
Hi,

I am not trying to pick on people's music tastes here, but in my opinion there is only one way to test the "quality" of a DSP chain that converts stereo to pseudo binaural (glad you put binaural in inverted commas!). And to me that means converting some high quality recordings of classical music - mainly orchestra.

The reason is that there are far more "absolutes" with a classical orchestra in terms of positioning within the 3D space - the precise positions of each player and section are known (so you would have to avoid recordings where the positions are changed from the accepted standard - easy to tell just by first listening on a high quality convential speaker setup). On the other hand, popular electronic music is usually done with multitracking in recording studios and the spatial / stereo effects are often derived during the production process rather than - say - in an unamplified live concert where the musicians are part of a "proper", natural acoustic space which is picked up by minimalist micing.

Secondly, acoustic instruments such as violins and woodwinds are much tougher customers for accurate timbral reproduction than electronic based musical instruments. If you can run the sound of an orchestra violin section through a processing chain and still end up with that violin section sounding like it did before - but instead properly placed within the synthesised "binaural" acoustic, then you are 90% the way there.

With the demo you have posted, there is no real way to tell the overall quality of the process in terms of spatial and timbral accuracy. The timbres are not challenging enough to reproduce, as well as there being no natural 3D acoustic to begin with.
 
Apr 21, 2008 at 3:35 AM Post #3 of 18
Quote:

Originally Posted by ADD /img/forum/go_quote.gif
Hi,

I am not trying to pick on people's music tastes here, but in my opinion there is only one way to test the "quality" of a DSP chain that converts stereo to pseudo binaural (glad you put binaural in inverted commas!). And to me that means converting some high quality recordings of classical music - mainly orchestra.



Ok.
I'd hoped you would've said something about the track that I did use, though.

Quote:

Originally Posted by ADD /img/forum/go_quote.gif
On the other hand, popular electronic music is usually done with multitracking in recording studios and the spatial / stereo effects are often derived during the production process rather than - say - in an unamplified live concert where the musicians are part of a "proper", natural acoustic space which is picked up by minimalist micing.


My setup shines in recordings like these. An instrument dead-panned to the right, for example, will sound a lot more natural. If an instrument is already recorded in a natural acoustic space, there is no need to artificially place it in another. For this kind of music, I would use pure crossfeed.

Quote:

Originally Posted by ADD /img/forum/go_quote.gif
Secondly, acoustic instruments such as violins and woodwinds are much tougher customers for accurate timbral reproduction than electronic based musical instruments. If you can run the sound of an orchestra violin section through a processing chain and still end up with that violin section sounding like it did before - but instead properly placed within the synthesised "binaural" acoustic, then you are 90% the way there.


I agree. Even with companies spending lots of money into research, everyone's ears are different and that makes it a challenge.
I, for one, wish that I could have my HRTF measured so I wouldn't have to approximate anything...

Quote:

Originally Posted by ADD /img/forum/go_quote.gif
With the demo you have posted, there is no real way to tell the overall quality of the process in terms of spatial and timbral accuracy. The timbres are not challenging enough to reproduce, as well as there being no natural 3D acoustic to begin with.


I've uploaded the original and processed version of a classical recording (FLAC)
Robert Schumann - Symphony No.2 In C Major, Op.61: Scherzo (Allegro Vivace) (56 MB)
 
Apr 21, 2008 at 4:46 AM Post #4 of 18
Hi,

In the second example I much prefer the original to the processed one. In the processed version, the violin sound has lost that delicate edge and complex texture to it. They sound tubbier, flatter and the top end feels like it's missing ((though the original recording is missing important high frequency information to begin with - but my point is that is not preserved in the processed one. For example, the subtle nauances of the high speed spiccato bowing in the first violins is lost in murky cloudiness compared to the original.

Additionally, the sound stage is far too exaggerated in the processed one to make me feel like the musicians are spread out in a stage in front of me. For example, the cellos are far to off to the right and the first vioins far too off to the left.

Finally, there seems to be a very artificial acoustic in the processed version. It sounds synthetic and un-natural - almost as if the orchestra has been placed into three seperate, overly-reverberant halls - one way off to my left , one in the middle and one on the right. And the hall echo and reverberation itself sounds synthetic in the processed version but natural in the original.

The reason I did not comment on the first example was because - as I said before - it did not challenge your processing chain like this example has and had no absdolute points of reference like the second example does - so any comments would have been misleading.

I agree about the potential usefulness of custom HRTFs. I have tried lots of them and none of them sound anything like the way I hear - in almost all cases as soon as I whack one in a processing chain, it's like the entire last octave of frequencies are lobbed off. At best they still damage the timbre of acoustic instruments though.
 
Apr 21, 2008 at 6:40 AM Post #5 of 18
Quote:

Originally Posted by ADD /img/forum/go_quote.gif
Hi,

In the second example I much prefer the original to the processed one. In the processed version, the violin sound has lost that delicate edge and complex texture to it. They sound tubbier, flatter and the top end feels like it's missing ((though the original recording is missing important high frequency information to begin with - but my point is that is not preserved in the processed one. For example, the subtle nauances of the high speed spiccato
bowing in the first violins is lost in murky cloudiness compared to the original.

Additionally, the sound stage is far too exaggerated in the processed one to make me feel like the musicians are spread out in a stage in front of me. For example, the cellos are far to off to the right and the first vioins far too off to the left.

Finally, there seems to be a very artificial acoustic in the processed version. It sounds synthetic and un-natural - almost as if the orchestra has been placed into three seperate, overly-reverberant halls - one way off to my left , one in the middle and one on the right. And the hall echo and reverberation itself sounds synthetic in the processed version but natural in the original.



I would argue that the sound stage of headphones is naturally too limited, and through my setup, I am able to extend it. When listening to the unprocessed recording, I feel like the sounds are near my head. Soundwaves are arriving at an unnatural angle, 90 degrees, which leaves the center channel inside my head. Bass notes near dead-left/right cause something in my ears to vibrate making me instinctively know that something is very near to my ear. Crossfeed eliminates those problems, corrects the unnatural 90-degree angle to 30 degrees and the room gives the "air". Ears don't feel so fatigued anymore.

My opinion is, that there should be no limit to how much sound stage there should be in a headphone. We've heard that in real binaural recordings. I think the effect of performers being in a stage in front of the listener is a limitation of headphones, not a feature. I doubt classical recordings are made with headphones as priority, either. In this case, crossfeed would suit your tastes better, though you would still argue the sound gets muddy etc.

Of course there is an "artificial acoustic". The original reverb is placed inside another room reverb. That is, however, exactly what happens when classical recordings are listened to in a room using real speakers. Sound becomes more muddy and distant. The problem then is that one is not able compare the speakers+room with headphones of the exact same sound signature, which you are doing when comparing the processed with the original recording. Or maybe the effect is more subtle in real life and I just like overblown room sound. I think a mild sound quality loss is acceptable if soundstage is unlocked. As a perfectionist or someone who fiddles with things for long periods of time, I do not accept heavy quality loss easily.

I think you seem to be missing the point. Rooms have flaws, and the rooms' reverb masks the details of the original recording. If I am emulating a room sound using a DSP chain, I can't expect it to be better than the original room in question, can I? My original aim was to make listening music with headphones feel like you were in a flat-sounding room. That's what the Dolby people wanted to do, and did. I just combined an idea with theirs, and made it more realistic, avoiding frequency anomalies caused by the "average" HRTF.

Quote:

Originally Posted by ADD /img/forum/go_quote.gif
The reason I did not comment on the first example was because - as I said before - it did not challenge your processing chain like this example has and had no absolute points of reference like the second example does - so any comments would have been misleading.


Then it must've been a challenge you expected me to tackle as a priority, which I don't. You seem to be in the pursuit of clinical audio quality, and dismiss any advantages by saying that the soundstage is too much for your perception of headphones. You expected something I couldn't give. My goal is natural sound, getting away from the clinical, unnatural soundstage of a headphone, to have air. Why good would it do for both of us for you to criticize my setup heavily based on your criteria, if you expected something different than what I intended to give?.. So let's leave it at that.
 
Apr 21, 2008 at 7:18 AM Post #6 of 18
You have said your goal is natural sound but there is nothing at all natural about it. It sounds very "processed". As for the acoustic / room reverb issue, the same thing applies. Your point is correct that you are creating a virtual room, but the result does not sound anything like any room I have ever been in.

As for the loss of detail and clarity due to distance, that is not my experience. At a concert or with high quality speakers 12 feet away, absolutely none of the detail is lost at all. One should be able to hear a pin drop onto a wooden floor in a good acoustic from 2 feet away or 60 feet away. I can hear the bow across the strings of every violinist from 15 rows back at a concert or the same thing on a good recording played back over a decent speaker setup.

I don't at all understand why you would not be concerned about sound stage limits with headphones. An orchestra is not spread out across 500 feet in front you like it is on your recording. Just because a binaural recording technique might arguably facilitate localisation of sounds 360 degrees around us does not mean we try to force a soundstage to be as wide as possible. An orchestra is positioned from my front left to my front right - not all around me - or hard left and hard right with a mess in the middle as in this recording.

The soundstage on your recording is not only completely implausible, but if you notice, there is no accurate localisation of any instruments between the second violins and the cello section - it's all just a vague tapestry. I should be able to point precisely to the oboe, flutes, horns and trumpets, but it is not possible on your recording and the impossible width of the stage is having a lot to do with that. I can do all of that much better on the unprocessed original.
 
Apr 21, 2008 at 10:38 AM Post #7 of 18
Quote:

Originally Posted by ADD /img/forum/go_quote.gif
You have said your goal is natural sound but there is nothing at all natural about it. It sounds very "processed". As for the acoustic / room reverb issue, the same thing applies. Your point is correct that you are creating a virtual room, but the result does not sound anything like any room I have ever been in.
...



Ok, final chance.
If this sounds processed to you or doesn't impress you in any way, I don't know what else to do.
This is as real as I ever have got this setup to sound.
This time, only the left channel is used.
Casiopea - Sombrero (10 MB)
 
Apr 21, 2008 at 11:41 AM Post #8 of 18
Quote:

Originally Posted by Tabi /img/forum/go_quote.gif
This is as real as I ever have got this setup to sound.


But therein lies the problem. Just because this is as good as you have been able to make it does not necaserrily mean that it is objectively as good as the real thing. I have not listened to any of the tracks so I am purely commenting on the semantics of that statement not on the sound (none of my players can play flac and I don't want other software on my machine). Otherwise a very interesting exchange of views to read.
 
Apr 21, 2008 at 1:49 PM Post #9 of 18
To me, binaural is a lot more about the the "out of head" experience. It makes you feel as if you are listening to something that's NOT coming from your headphones. It's simply about the experience.

I discovered binaural about 7 years ago, when someone in a high-end audio shop introduced me to the concept. He didn't tell me what I was going to be listening to when I first tried my Grado 125's.... As soon as the demo started, I took the headphones off, to see who was playing guitar in the room...

I then spent quite some time reading about the dummy head thing and bought a few CD's. I wish more recordings would be done this way.

Tabi, I would really enjoy listening to many many more "processed" files like the last one. I wish I could run some of my library through this process....

Great job, truely enjoyable!!!

Johnnydrz
 
Apr 22, 2008 at 8:54 AM Post #11 of 18
Quote:

Originally Posted by Tabi /img/forum/go_quote.gif
Ok, final chance.
If this sounds processed to you or doesn't impress you in any way, I don't know what else to do.
This is as real as I ever have got this setup to sound.
This time, only the left channel is used.
Casiopea - Sombrero (10 MB)




Hi,

This is not so much case of being impressed or not - I'm simply trying to make a point about imaging that many head-fi listeners have difficulty fully grasping and which is illustrated in your files. I notice that people who come from a background as working musicians or listening to high quality near field speaker setups have a much better understanding of what it means to attempt to recreate the full depth, width and cohesiveness of a 3D soundstage.

Getting the sound out of the head is one thing, but in my opinion it's not just about doing that - it's about trying to get as close as possible to what a real concert situation should sound like whilst wearing a pair of headphones. Obviously you can't get all the way there, but the closer you get there, the more convincing things will be.

I can speak from a lot of personal experience in this regard, because I have a musical background as a soloist, orchestra leader and concert goer. What I hear in those situations is an ability to precisely locate the musicians in the 3D space in front of me - and that is much, much more than the sound being nothing more than out of my head. It is - to put it more accurately, out of my head and the right distance way - both forward / back and left / right. Just getting things out of the head is actually less than half the battle.

If you go to a live concert these things are very easy to hear. Or you can listen to high quality stand mounts such as an LS35A in the middel of a room and get pretty much the same thing.

I'm not going to pretend I am some sort of expert in getting this to happen with headphones, because I honestly believe that after over 1000 hours of my own experiments, the technical issues are too great to overcome. Whilst certain trickery can compensate for lost timbres, frequency balance issues, directionality aberrations, etc, "binaural" algorythms and processing methods are still going to be at sea when it comes to creating a solid convincing and cohesive soundstage - particularly with lower pitched instruments which have a significant omnidirectional sound component to begin with (eg double basses).

I have attached a file which might hopefully demonstrate a little better the points I am attempting to make about not only getting headphone sound out of the head, but trying to recreate all the specific localisation clues that one would get if one were at the live performance 12 - 15 rows back in the concert hall.

If one listens to the attached example using headphones but with absolutely no processing (no EQ, no crossfeed, no anything at all), then the orchestra should appear to be in an overall physical space commensurate with listening to the live performance from a best seat in the house so to speak.

The sound itself might not seem modern, but that is because the equipment used to make the original recording dates back to the very early 1950s. Nevertheless, the simplicity of the approach used in the original recording makes the potential destruction of (or serious damage to) a soundstage less likely.

This particular track is a very good example (opening of the slow movement of Beethoven Symphony No. 7), because each instrumental section starts up one after the other and the relative positions out in front of the listener - even with headphones - should be as easy to distnguish as if the headphones did not exist and the listener was at the actual performance instead. This all depends upon the person's hearing capabilties, resonance frequencies of their ear canal, quality of headphone and distance between ears, etc. In this example there is also no attempt to recreate a room effect, other than to preserve as much as possible (or damage as little as possible) the room acoustic of the original recording.

Below is a photo from actual recording session showing the positioning of the players. The goal (in a perfect world) is to reproduce the exact relative positioning of those players to an imaginary listener who would be ideally positioned in the same plane as the conductor, but much further back:

recordingsessionhb5.jpg


From the second bar of the excerpt it is clear that the double basses are a lost cause, however in real life, often the frequencies are low enough that our ears cannot properly determine directionality. But a very good speaker setup will still cope better in this particular respect. Cellos, violas, second and first violins, woodwind and brass are OK and measure up fairly well to the photo. Event the staggered arrangement of second violins relative to first (one desk back to the left) can be heard on the headphones.

intro to slow movement - Beethoven Symphony No. 7 - TACET vinyl LP L149
 
Apr 22, 2008 at 11:25 AM Post #13 of 18
Hi Steve,

The recorder used was an ancient Telefunken M5 open reel, modified to run stereo (was originally mono!). Like the M49 microphones also used in the recording, it is early 1950s technology
wink.gif
I always find that noise reduction takes something away - even back in my cassette days I used Dolby "B" "at worst" so to speak, but later went to open reel with no noise reduction at all.

The noise floor is probably a little higher than the analogue master tape owing to the vinyl LP medium I used in this transcription. TACET made a pure digital recording at the same session which made it's way to a hybrid SACD. Needless to say the noise floor is not audible on the digital version.

Actually every LP recording I own bar this one and two other TACETS was recorded in the 50s or 60s, so it sounds very bizarre when I hear no hiss!
 
Apr 22, 2008 at 12:16 PM Post #14 of 18
Fascinating. The hiss doesn't bother me at all, just an observation. I love vinyl and a little noise adds to the ambiance.

If you have the full version available, digital or, preferably, vinyl, PM me!

Sorry Tabi, the processing does sound different but it seems like more of a gimmicky thing than an improvement. Nevertheless, A+ for effort! That's how things move forward. Just not in this case IMO.
smily_headphones1.gif
 
Apr 22, 2008 at 1:17 PM Post #15 of 18
Hi Steve,

These TACET recordings are brilliant
smily_headphones1.gif
It's just a pity that they don't put out as much vinyl as they do SACD - that is especially a little wierd given the time and investment they put into using all tube vintage gear.

The Polish Chamber Orchestra is top notch, which is not surprising given the Poles have a bit of a history at being brilliant at anything they do. Would have prefered 5 desks of fiddles though to add the final bit of body for a bit less work, as they tend to be just a smidgeon thin and overpowered on the sautille passages.

I've got this recording on LP and the hybrid SACD, though I can't hear the SACD layer. I much prefer the LP to the CD - the CD sound is a bit lacking in textural resolution and a bit hard and glassy sounding, even though I think it all ran from the same mic setup and mixer (and all those extra mics you see in the photo were - I believe - only used for the surround version on the SACD layer). My only real criticism of the recording is that the mics and tape recorder don't have the most fantastic top end on the planet - they really start to both roll off very hard after 16 khz - so the last little bits of subtle, delicate violin textures that I would hear at a live concert just aren't quite there. That said, the results with the M49 are very interesting, because back when RCA and Decca used them, in almost all cases the recordings tended to become a bit brittle sounding on strings when pushed to very high SPLs. I don't notice that problem at all here.

All in all these are incredibly interesting musical interpretations too - and probably more like what audiences would have heard back in the 19th century.
 

Users who are viewing this thread

Back
Top