24bit vs 16bit, the myth exploded!

castleofargh · Dec 9, 2014 at 3:20 PM

the walrus said:
Actually I'm not sure. I listened to both with Media Go. Could it be

rrod said:

Are you sure there's no on-the-fly down-conversion going on?

Click to expand...

Actually I'm not sure. I listened to both with Media Go. Could it be that the software is down sampling it before playing? What player should I use for PC to listen to 24 bit flac?

media go has an output option for ASIO for bit perfect, but you need to have asio or at least some kind of asio for all or a name like that(I'm a wasapi fanboy for totally non audio related reasons so I can't really tell).
else a good start could be to check that your windows options for sound aren't 16/44.

but as RRod was saying, I doubt that adding or removing the last bits would change anything to the music. it's a pretty innocent process involving zero calculation. maybe media go is the culprit, I never tried listening to music with it. I use foobar, other often recommend Jriver. other seem to like musicbee, but I would think that it has to do with the file managing options more than any special playback specialty.

stv014 · Dec 10, 2014 at 2:52 PM

the walrus said:
How can 24 bit version sound worse? Could it be that they used different masters?

It is not impossible, sometimes if the 24-bit version is released later, it can be louder and more compressed and distorted than it was on an older CD. If this is the case, the tracks may be visibly different in an audio editor.

Greenears · Dec 12, 2014 at 4:30 AM

Alright everyone .... drumroll ... I've finally done some ABX testing. It was relatively easy you can try the same thing yourselves.

I downloaded HDtracks 2014 sampler (free), and played track 2, Vivaldi's Spring on headphones.

I used SoX to convert it to straight 16 (no dither), I did not convert the sample rate so it was 96/24 vs 96/16.

Ran Foobar2000 ABX plugin on 30 trials and got 19/30 probability of guessing 10%.

I thought I was getting better and maybe with more trials I could get it lower, but decided to quit while I was ahead. I started well and then had a rough patch in the middle, but later on had a good streak. After 10-15 trials I honed the technique to just focus on a short section from 1:11-1:21 (use the blue slider to narrow). As the violin crescendos you listen for a certain "roughness" for the B (16-bit). Also the strings sound more "beautiful" but maybe a bit flatter in A (24 bit). Flat sounding is characteristic of more even harmonics, some may say "pure" or "warm" or "subdued" There is a little distortion in both A and B at the crescendo peak but it's just a little less grating in A. I also tried listening for "air" and "bass" earlier on but had more luck with the second even though I was sure 24 bit had more "air" at the start.

After a while I just set the volume on medium (reasonably loud for that passage) and didn't change it didn't even bother re-listening to A and B just jump between X and Y a few times and as soon as I heard "it" clearly I made the decision. Maybe 3 or 4 back and forths each decision. Every 5th pair or so I just rechecked A and B to remind myself.

Note I am quite a skeptic you can hear anything more than 16/44 (read my posts). This is my first ever ABX test. I should say that when you first hear the 24 bit you are quite impressed. I think it's a very good recording, and it reveals some technical brilliance of the players. It's quite startling in realism. However .... when you listen in 16 bit you realize it is not so much the bits as the recording and what feels like wider dynamic range than typical.

What did I prove? Hmmm. Not sure. 10% seems good but I was closer to 40% a lot of the time. I haven't used dither yet. I was quite impressed at my first 24 bit samples at first, but the more trials you fail you start to realize how close they are and many things you believe are "better" are actually identical. Still listening to the whole thing while I type this I could swear the 24 bit had a little more "emotional grab" or "immediacy" than the 16 bit. Or maybe I just want to hear that? hmmm.

stv014 · Dec 12, 2014 at 5:17 AM

greenears said:
I thought I was getting better and maybe with more trials I could get it lower, but decided to quit while I was ahead.

Note that doing many trials and looking for a run that happens to be below 5% (or whatever threshold is chosen) is a statistically biased method. X% chance of guessing means just that, and with enough trying it can eventually be achieved by luck. Do you still have the scores for all the trials you have ever done for that track ? Over a large number of trials with no selective inclusion of results, real audibility should produce a combined score that converges towards zero chance of guessing (for example, 69/100 is already less than 0.01% chance).

castleofargh · Dec 12, 2014 at 5:54 AM

Greenears, it's ok to show the result when you're doing it as a way to try and learn how to succeed. so you know when you successfully identified a difference and can practice finding it again. I did that a lot to learn identify mp3.
but when you're doing the real ABX, you should mask the results. because that's a bias like any other and you shouldn't have it in the test.

often when I look for something and let the answer visible, I will see that I randomly got right the first 3 or 4times(not that hard to get statistically) and that in itself will bias me into thinking I heard something. and I start making up cues in my head that sometimes were never real. it's just my playful brain that decides to give me what I ask for, without any regard to reality and what I'm really hearing.

greenears said:
.... when you listen in 16 bit you realize it is not so much the bits as the recording and what feels like wider dynamic range than typical.
....

changing the bit depth will not change anything in the dynamic of the recorded music.there are very little chances that the 24bit record actually uses more than 70db of dynamic.
and anyway each sample will be at the exact same loudness on both encodings. you must imagine it as chopping down the lowest sounds of the record, removing sounds from -96db to -144db not as changing the dynamic. from 0 to -96db the sound is really 100% identical. so what you describe as wider dynamic range is 100% bias. you went in your head from "changing the maximum possible dynamic range of the medium" to "changing the dynamic of the music". ^_^
I guess it's the most common misconception one can do in audio, so there really is no shame to it, we all went that road. actually some dudes with 30years of very active audiophile life are still going at it with that very misunderstanding. just like the max power of an amp being bigger never meant the amp would actually deliver more into a given headphone.

Greenears · Dec 12, 2014 at 10:33 AM

There is nothing wrong with looking at the results as you go. If you can discern the difference, then it is audible. if there is really truly is no difference then no help you get is going to matter. I didn't practice either just went straight in.

I don't know if foobar automatically logs the results, but for those that want the exact sequence I'd be happy to post it if you tell me where to look. I'll be doing some more tests anyway tomorrow. Generally I remember i started with 2/3 and then my % chance of guessing hovered in the 30-50% range. I don't think i went over 50 much if at all. Then it dropped to 10%. Yes 19/30 is hardly convincing I think I'm pretty clear about that in my post. I can do it again tomorrow when well rested and have time. Honing in one one passage definitely helps and as I used the blue bars my score got better.

Regarding dynamic range, to be clear my comment was with respect to the recording, not the bit depth. What I am saying is I listened to several tracks and chose the Vivaldi because I thought it had a wider dynamic range and more detail of loud and soft passages than most recordings. It was also recorded with a fair bit of echo and a "live" room/soundstage feel. I was hoping those soft echos and slight distortions would give me something to latch onto.

16 and 24 bit has nothing to do with the recorded dynamic range.

For those that claim this track does not have the range, please measure it.

castleofargh · Dec 12, 2014 at 11:31 AM

ok I misunderstood your sentence about dynamic.

about looking at the result while doing the test. the only condition where it isn't a bias is if you determine how many trials you will do before starting and stand by it. I find that not knowing makes it more honest between me and myself(also I can just stop when I'm bored ^_^).

about the track having huge dynamic, as long as it has less than 96db(and I would be very surprised if it didn't, the maximum I found in my library was somewhere around 65/70db.), then 16bit is enough by definition.

RRod · Dec 12, 2014 at 1:16 PM

greenears said:
There is nothing wrong with looking at the results as you go. If you can discern the difference, then it is audible. if there is really truly is no difference then no help you get is going to matter. I didn't practice either just went straight in.

I don't know if foobar automatically logs the results, but for those that want the exact sequence I'd be happy to post it if you tell me where to look. I'll be doing some more tests anyway tomorrow. Generally I remember i started with 2/3 and then my % chance of guessing hovered in the 30-50% range. I don't think i went over 50 much if at all. Then it dropped to 10%. Yes 19/30 is hardly convincing I think I'm pretty clear about that in my post. I can do it again tomorrow when well rested and have time. Honing in one one passage definitely helps and as I used the blue bars my score got better.

Regarding dynamic range, to be clear my comment was with respect to the recording, not the bit depth. What I am saying is I listened to several tracks and chose the Vivaldi because I thought it had a wider dynamic range and more detail of loud and soft passages than most recordings. It was also recorded with a fair bit of echo and a "live" room/soundstage feel. I was hoping those soft echos and slight distortions would give me something to latch onto.

16 and 24 bit has nothing to do with the recorded dynamic range.

For those that claim this track does not have the range, please measure it.

Can't download it because HDTracks requires a downloaded that doesn't work in either Linux or Wine. But I've heard enough Vivaldi to know that it doesn't have the dynamic range of Mahler, and Mahler works on CD just fine. The proper way to do the test is to pick a set number of trials, and do all the trials without looking at results. Also, here's what you should be doing in Sox, just in case:
sox -V4 24bit.flac -b 16 24to16bit.flac dither -s
sox -V4 24to16bit.flac -b 24 16to24.flac

Then you compare 24bit.flac to 16to24.flac, with your sound card / dac set to 24/96.

bigshot · Dec 12, 2014 at 1:31 PM

If you look at your results as you go, you aren't doing a blind test.

I don't think that there is a piece of recorded music with a dynamic range that gets close to needing 24 bit, so it really doesn't matter what music you use.

cjl · Dec 12, 2014 at 1:37 PM

bigshot said:
If you look at your results as you go, you aren't doing a blind test.

That isn't quite true. It's still a blind test because while you're trying to decide whether X is A or B, you do not have any knowledge (or way of knowing) which one it is, aside from the sound quality. That having been said, it is true that looking at the results while you're testing and then quitting "while you're ahead" does bias the test statistically. The more valid and rigorous method would be to decide the number of trials in advance, and do that number of trials regardless of intermediate outcomes.

bigshot · Dec 12, 2014 at 1:44 PM

If you consider them a bunch of different tests it may be considered blind. But if you are considering it all part of the same test, it isn't blind.

Greenears · Dec 12, 2014 at 8:12 PM

Thanks all for interesting responses. I'd like to get back to the meat of the testing, so let me address them collectively:

Maybe I should back up first and let you know the purpose of me posting as I am going along is to (a) get tips for improvement (b) encourage others to reproduce my results. Maybe with your ears and/or equipment you can do better. I will address a few objections as a courtesy, but I don't want to get too sidetracked from (a) and (b). I would encourage more ideas on whether to listen to short clips or long, what to look for, better tracks to try (as long as normally accessible in US over regular network), etc etc etc.

1. Number of trials and looking at the results

So I hear your input, and I can see where you are coming from. Actually I never put any thought into this it was the default setting and off I went. Plus it was late and I wanted success quicker rather than slower, if it was possible to discern. However, I have taken a couple of 2nd year courses in probability enough to know the binomial distribution and its application. It seems the ABX plugin is doing a straight binomial distribution calculation on the success/trials ratio. It doesn't care what you are thinking or doing or intent or whether you are looking at the result or not. That's the beauty of ABX testing. The only thing that would be invalid is to throw out failing trials but the tool won't let you do that .... you can start again from the beginning or continue but that's it. Note that 3/4 yields a very different % than 30/40 - that is baked into binomial. The % is valid no matter what you do or when you quit. I promise you. However, note that 10% means exactly that - 1 in 10 chance I was flipping coins and using that to decide.

To humor everyone if I get what I consider solid results (5% or better) I'll redo it both ways and post the logs. See how nice I am? But I did want to set the record straight for our dear readers.

2. Dynamic range, and whether this Vivaldi is a good track to get a positive result

Firstly, people often confuse available dynamic range (eg. 96 dB for redbook CD) vs actual dynamic range of a section of music (ratio of loudest to softest part). I was trying to select a piece of music with a high value for the second. I didn't measure it (someone provide me a SoX incantation and I'll gladly do it). Whether Spring can have a dynamic range depends on the music to an extent, yes I agree, but also how it was recorded. With a sensitive mic really close to a violin and if the player plays very softly and very loud, you could get a range. The rest is the mix and how much dynamic range compression is applied. My expectation is that at the most this is 50 dB, more likely less Compared to 15 db for a lot of popular music I'm told. I heard one engineer claim 60 dB on a big orchestra, that is the highest claim I've ever read. The reason I want high range is to look for softer passages, were the relative delta of each step of quantization is highest. This is where 24 bit might shine.

3. Difficulty downloading to reproduce

I'm really really sorry HDtracks doesn't support Linux. But if you find another track we can both download legitimately I'm game. I was suggested to use HD Tracks 24 bit Random Access Memories but I thought Vivaldi was a better start point. I can't get into Pono store yet, the European place only starts in January. Where else?

4. Dither and reconverting to 24 bit

I intentionally started without dither. I want the best chance for success first, then I'll add dither and see if it makes me fail.

>sox -V4 24to16bit.flac -b 24 16to24.flac -- interesting suggestion, but I don't think think upconverting is legitimate is it? The dither will be on the 8th LSB and not 0 LSB when you get back to 24. The DAC natively handles 16 and 24 and converts to multi-segment Sig-Delt anyway so I think my method is legit. This incantation was suggested after some discussion.

One word about success: I am an admitted skeptic about 24 bit. My intellectual bias says that 16 bit redbook may be the be-all and end-all of music, all we need is better reproduction hardware. But, I'm not 100% sure of my position. So why do I want to succeed in telling them apart? The weakness I find in ABX testing is when someone does a whole series of comparison and they all come up negative. It is easily attacked, and hard to prove the negative (ie if you had just done X you would have passed). But if you pass one and gradually change parameters till you fail, it shows where the knee in the curve is for your ears and setup. IMHO.

sonitus mirus · Dec 12, 2014 at 8:40 PM

With respect to the number of trials and immediately seeing the results of each, the fallacy of the maturity of chances might create bias.

Roly1650 · Dec 12, 2014 at 10:31 PM

greenears said:
Thanks all for interesting responses. I'd like to get back to the meat of the testing, so let me address them collectively:

Maybe I should back up first and let you know the purpose of me posting as I am going along is to (a) get tips for improvement (b) encourage others to reproduce my results. Maybe with your ears and/or equipment you can do better. I will address a few objections as a courtesy, but I don't want to get too sidetracked from (a) and (b). I would encourage more ideas on whether to listen to short clips or long, what to look for, better tracks to try (as long as normally accessible in US over regular network), etc etc etc.

1. Number of trials and looking at the results

So I hear your input, and I can see where you are coming from. Actually I never put any thought into this it was the default setting and off I went. Plus it was late and I wanted success quicker rather than slower, if it was possible to discern. However, I have taken a couple of 2nd year courses in probability enough to know the binomial distribution and its application. It seems the ABX plugin is doing a straight binomial distribution calculation on the success/trials ratio. It doesn't care what you are thinking or doing or intent or whether you are looking at the result or not. That's the beauty of ABX testing. The only thing that would be invalid is to throw out failing trials but the tool won't let you do that .... you can start again from the beginning or continue but that's it. Note that 3/4 yields a very different % than 30/40 - that is baked into binomial. The % is valid no matter what you do or when you quit. I promise you. However, note that 10% means exactly that - 1 in 10 chance I was flipping coins and using that to decide.

To humor everyone if I get what I consider solid results (5% or better) I'll redo it both ways and post the logs. See how nice I am? But I did want to set the record straight for our dear readers.

2. Dynamic range, and whether this Vivaldi is a good track to get a positive result

Firstly, people often confuse available dynamic range (eg. 96 dB for redbook CD) vs actual dynamic range of a section of music (ratio of loudest to softest part). I was trying to select a piece of music with a high value for the second. I didn't measure it (someone provide me a SoX incantation and I'll gladly do it). Whether Spring can have a dynamic range depends on the music to an extent, yes I agree, but also how it was recorded. With a sensitive mic really close to a violin and if the player plays very softly and very loud, you could get a range. The rest is the mix and how much dynamic range compression is applied. My expectation is that at the most this is 50 dB, more likely less Compared to 15 db for a lot of popular music I'm told. I heard one engineer claim 60 dB on a big orchestra, that is the highest claim I've ever read. The reason I want high range is to look for softer passages, were the relative delta of each step of quantization is highest. This is where 24 bit might shine.

3. Difficulty downloading to reproduce

I'm really really sorry HDtracks doesn't support Linux. But if you find another track we can both download legitimately I'm game. I was suggested to use HD Tracks 24 bit Random Access Memories but I thought Vivaldi was a better start point. I can't get into Pono store yet, the European place only starts in January. Where else?

4. Dither and reconverting to 24 bit

I intentionally started without dither. I want the best chance for success first, then I'll add dither and see if it makes me fail.

>sox -V4 24to16bit.flac -b 24 16to24.flac -- interesting suggestion, but I don't think think upconverting is legitimate is it? The dither will be on the 8th LSB and not 0 LSB when you get back to 24. The DAC natively handles 16 and 24 and converts to multi-segment Sig-Delt anyway so I think my method is legit. This incantation was suggested after some discussion.

One word about success: I am an admitted skeptic about 24 bit. My intellectual bias says that 16 bit redbook may be the be-all and end-all of music, all we need is better reproduction hardware. But, I'm not 100% sure of my position. So why do I want to succeed in telling them apart? The weakness I find in ABX testing is when someone does a whole series of comparison and they all come up negative. It is easily attacked, and hard to prove the negative (ie if you had just done X you would have passed). But if you pass one and gradually change parameters till you fail, it shows where the knee in the curve is for your ears and setup. IMHO.

And of course you've checked that the HD Tracks file is genuine 24/96 and not upsampled 16/44 right? Being upsampled wouldn't be a first for that outfit.

RRod · Dec 12, 2014 at 10:51 PM

greenears said:
Thanks all for interesting responses. I'd like to get back to the meat of the testing, so let me address them collectively:

Maybe I should back up first and let you know the purpose of me posting as I am going along is to (a) get tips for improvement (b) encourage others to reproduce my results. Maybe with your ears and/or equipment you can do better. I will address a few objections as a courtesy, but I don't want to get too sidetracked from (a) and (b). I would encourage more ideas on whether to listen to short clips or long, what to look for, better tracks to try (as long as normally accessible in US over regular network), etc etc etc.

1. Number of trials and looking at the results

So I hear your input, and I can see where you are coming from. Actually I never put any thought into this it was the default setting and off I went. Plus it was late and I wanted success quicker rather than slower, if it was possible to discern. However, I have taken a couple of 2nd year courses in probability enough to know the binomial distribution and its application. It seems the ABX plugin is doing a straight binomial distribution calculation on the success/trials ratio. It doesn't care what you are thinking or doing or intent or whether you are looking at the result or not. That's the beauty of ABX testing. The only thing that would be invalid is to throw out failing trials but the tool won't let you do that .... you can start again from the beginning or continue but that's it. Note that 3/4 yields a very different % than 30/40 - that is baked into binomial. The % is valid no matter what you do or when you quit. I promise you. However, note that 10% means exactly that - 1 in 10 chance I was flipping coins and using that to decide.

To humor everyone if I get what I consider solid results (5% or better) I'll redo it both ways and post the logs. See how nice I am? But I did want to set the record straight for our dear readers.

RRod>> There are subtle differences in test statistics when using different stopping criterion. Getting 9/12 binomial trials right does not yield the same frequentist result as taking 12 trials to get 9 successes from a negative binomial. So it's valid for people to worry about things like stopping early and choosing best runs.

2. Dynamic range, and whether this Vivaldi is a good track to get a positive result

Firstly, people often confuse available dynamic range (eg. 96 dB for redbook CD) vs actual dynamic range of a section of music (ratio of loudest to softest part). I was trying to select a piece of music with a high value for the second. I didn't measure it (someone provide me a SoX incantation and I'll gladly do it). Whether Spring can have a dynamic range depends on the music to an extent, yes I agree, but also how it was recorded. With a sensitive mic really close to a violin and if the player plays very softly and very loud, you could get a range. The rest is the mix and how much dynamic range compression is applied. My expectation is that at the most this is 50 dB, more likely less Compared to 15 db for a lot of popular music I'm told. I heard one engineer claim 60 dB on a big orchestra, that is the highest claim I've ever read. The reason I want high range is to look for softer passages, were the relative delta of each step of quantization is highest. This is where 24 bit might shine.

RRod>> The largest RMS range I've found in my CDs so far is about 65dB, and the quiet parts sound great.

3. Difficulty downloading to reproduce

I'm really really sorry HDtracks doesn't support Linux. But if you find another track we can both download legitimately I'm game. I was suggested to use HD Tracks 24 bit Random Access Memories but I thought Vivaldi was a better start point. I can't get into Pono store yet, the European place only starts in January. Where else?

RRod>> See here: http://www.linn.co.uk/christmas?day=12

4. Dither and reconverting to 24 bit

I intentionally started without dither. I want the best chance for success first, then I'll add dither and see if it makes me fail.

>sox -V4 24to16bit.flac -b 24 16to24.flac -- interesting suggestion, but I don't think think upconverting is legitimate is it? The dither will be on the 8th LSB and not 0 LSB when you get back to 24. The DAC natively handles 16 and 24 and converts to multi-segment Sig-Delt anyway so I think my method is legit. This incantation was suggested after some discussion.

RRod>> Upconverting is totally legitimate; you're just padding with 0s to embed the 16bit content into 24, so the process is completely reversible with no content change. It is very possible that your DAC gives no subtle cues when bit-depth switching, but it's always best to remove variables.

One word about success: I am an admitted skeptic about 24 bit. My intellectual bias says that 16 bit redbook may be the be-all and end-all of music, all we need is better reproduction hardware. But, I'm not 100% sure of my position. So why do I want to succeed in telling them apart? The weakness I find in ABX testing is when someone does a whole series of comparison and they all come up negative. It is easily attacked, and hard to prove the negative (ie if you had just done X you would have passed). But if you pass one and gradually change parameters till you fail, it shows where the knee in the curve is for your ears and setup. IMHO.

RRod>> You are right in that it's a bit weird to do only 1-sided confidence intervals for the test, as it would seem someone who can consistently only get 10% right has something special going on (in fact, this would happen with someone who can get 90% correct but was told to choose the option that *didn't* match). Be that as it may, I haven't really seen such a result come up in the examples I've seen. It is worth mulling over the theory a bit, though.

Resposes above.

Latest Thread Images

24bit vs 16bit, the myth exploded!

castleofargh

Sound Science Forum Moderator

stv014

Headphoneus Supremus

Greenears

100+ Head-Fier

stv014

Headphoneus Supremus

castleofargh

Sound Science Forum Moderator

Greenears

100+ Head-Fier

castleofargh

Sound Science Forum Moderator

RRod

Headphoneus Supremus

bigshot

Headphoneus Supremus

cjl

500+ Head-Fier

bigshot

Headphoneus Supremus

Greenears

100+ Head-Fier

sonitus mirus

Headphoneus Supremus

Roly1650

100+ Head-Fier

RRod

Headphoneus Supremus

Users who are viewing this thread