Research that ABX/DBT testing reflects real world listening?

bigshot · Nov 9, 2013 at 4:19 PM

The thing most people don't understand is that flat response isn't a "flavor" of sound. Flat response can have huge bass or sparkling treble. The midrange can be punchy or recessed. "Flat" is not a sound. It's a CALIBRATION.

Good recording studios calibrate their monitors to a flat response. If you calibrate your speakers or headphones the same way, you'll hear the same sound the engineers heard in the studio. Whenever I hear someone talking about how flat response has weak bass or doesn't sound punchy enough, I know that they don't really understand what flat response is. They're just reacting subjectively to the word "flat".

Flat response is ACCURATE response. Nothing more, nothing less. Whether it's "warm" or "detailed" or "punchy" all depends on the decisions made by the engineers who mixed and mastered the music.

gregorio · Nov 9, 2013 at 4:44 PM

cel4145 said:
Harman engineers are the ones making the claims that people prefer a flat frequency response based on ABX testing. My point is that the ultimate goal of all this research is to find out what speaker attributes result in a pleasurable listening experience, and that's exactly one of the things that I'm questioning--whether or not ABX testing can work for that. Which I was asking about the research on this in my first post.

Reading the linked article, I don't believe from their description that their test can be described as ABX, a blind test certainly, but not an ABX. Literally it was not comparing A with B, the test sought preferences rather than whether a difference could be detected and the test also included comfort determinations. As I mentioned earlier, the senses are all linked, how can we be sure that the more comfortable headphones were not, at least in part, judged as sounding better? Now that we're talking about preference rather than just identifying if something is detectably different, we've got to contend with how the individual experience of the listeners affected the results, in what way were they experienced listeners? Also, how can we be sure that none of the listeners could identify or at least have a good guess at what model of headphones they were testing, from wearing them. If they were experienced listeners, there must be a fair chance they've tried some of those headphones in the past.

While it's possible to tear substantial holes in the methodology of this test, we've got to consider what is possible. For example, I can't see how in practise one could ABX the sound signature of two different headphones without actually putting them on and therefore without eliminating any potential biased caused by the level of comfort. The authors of the article appear to have gone to decent lengths to eliminate some biases and arguably they eliminated as many as were realistically feasible. And, for that reason the test is laudable and I would certainly take it's conclusions seriously. However, this type of preference based blind test does not carry nearly as much scientific weight for me personally as a straight forward "is there a detectable difference" ABX test.

The loud speaker test is even more troubling. As Tyll mentioned, put even a hypothetically perfectly flat speaker in an average room and what you'll hear will be nowhere near flat. While in general there will usually be an overall boost in the low frequencies the exact response is entirely dependent on the room dimensions, construction materials and furnishings. Forget the hundredths or thousandths of a dB difference in cables, the phase cancellation and summing in an average room are going to cause several peaks and dips which can differ by as much as 30dB, as well as a considerable number of smaller peaks and dips. In fact, high quality commercial recording studios with considerable acoustic treatment consider it quite an achievement to only have a few of these peaks and dips and if they are limited to 6dB or so! Exactly where these peaks and dips occur in the audible frequency spectrum is completely variable from one room to another so constructing a speaker which even sounds roughly the same from one untreated room to another is impossible, let alone one which produces a flat response in different rooms. I get the distinct impression that at least some of these tests are marketing disguised as serious scientific research, which isn't entirely unheard of in the audiophile world!

G

bigshot · Nov 9, 2013 at 5:05 PM

Too many people extrapolate the fact that it is difficult to achieve flat response with speakers in the average living room to mean that it is IMPOSSIBLE to do that. It certainly is possible. It just takes the proper application of room acoustics and equalization.

Copperears · Nov 9, 2013 at 5:08 PM

I always think of bird-watching as a useful parallel. Those who carefully study the distinctive markings of species and genders, from illustrations like Sibley's that highlight same, can actually "see" those distinctive markings better and more quickly than those with an untrained eye.

Not to add a subjectivist element to the conversation; just to note how difficult it is to isolate valid distinctions in a sensory test. Not impossible, but requires careful science and precise measure.

bigshot · Nov 9, 2013 at 5:17 PM

I think that audiophilia often reflects a certain amount of hubris in being able to perceive things other people can't. I don't feel that way. That's why I call myself a hifi nut, not an audiophile. I'm more interested in having a system with good sound than I am investing energy into propping up my absolutely average human hearing with outlandish claims of goldenness.

cel4145 · Nov 9, 2013 at 6:08 PM

gregorio said:
Reading the linked article, I don't believe from their description that their test can be described as ABX, a blind test certainly, but not an ABX. Literally it was not comparing A with B, the test sought preferences rather than whether a difference could be detected and the test also included comfort determinations.

Yeah. My bad.

It's double blind testing. Is that the right term?

gregorio said:
Reading the linked article, I don't believe from their description that their test can be described as ABX, a blind test certainly, but not an ABX. Literally it was not comparing A with B, the test sought preferences rather than whether a difference could be detected and the test also included comfort determinations. As I mentioned earlier, the senses are all linked, how can we be sure that the more comfortable headphones were not, at least in part, judged as sounding better? Now that we're talking about preference rather than just identifying if something is detectably different, we've got to contend with how the individual experience of the listeners affected the results, in what way were they experienced listeners? Also, how can we be sure that none of the listeners could identify or at least have a good guess at what model of headphones they were testing, from wearing them. If they were experienced listeners, there must be a fair chance they've tried some of those headphones in the past.

While it's possible to tear substantial holes in the methodology of this test, we've got to consider what is possible. For example, I can't see how in practise one could ABX the sound signature of two different headphones without actually putting them on and therefore without eliminating any potential biased caused by the level of comfort. The authors of the article appear to have gone to decent lengths to eliminate some biases and arguably they eliminated as many as were realistically feasible. And, for that reason the test is laudable and I would certainly take it's conclusions seriously. However, this type of preference based blind test does not carry nearly as much scientific weight for me personally as a straight forward "is there a detectable difference" ABX test.

The loud speaker test is even more troubling. As Tyll mentioned, put even a hypothetically perfectly flat speaker in an average room and what you'll hear will be nowhere near flat. While in general there will usually be an overall boost in the low frequencies the exact response is entirely dependent on the room dimensions, construction materials and furnishings. Forget the hundredths or thousandths of a dB difference in cables, the phase cancellation and summing in an average room are going to cause several peaks and dips which can differ by as much as 30dB, as well as a considerable number of smaller peaks and dips. In fact, high quality commercial recording studios with considerable acoustic treatment consider it quite an achievement to only have a few of these peaks and dips and if they are limited to 6dB or so! Exactly where these peaks and dips occur in the audible frequency spectrum is completely variable from one room to another so constructing a speaker which even sounds roughly the same from one untreated room to another is impossible, let alone one which produces a flat response in different rooms. I get the distinct impression that at least some of these tests are marketing disguised as serious scientific research, which isn't entirely unheard of in the audiophile world!

G

The loudspeaker test, though, you could compare how people perceived the speakers in that room, and then look at how the speakers were measured in that space at the listening position, then make your judgments based on those measurements of frequency response instead of the anechoic results. So I could see how that could work within the test itself.

But I don't disagree. There are a lot of factors. I think they need to work with a much larger speaker sample sets. And--and this could maybe work for headphones, too--use some sophisticated DSP filtering that can impose hundreds of PEQ filters to EQ sets of speakers so that all are flat within a very discriminating tolerance. Then do lots of double blind testing between them, and with the EQ adjusted for different emphasis curves. That could also help to tell us more. My personal "guess" is that people prefer speakers to have a smoother frequency response over ones that are not (so +/-1 db variations in the curve are better than +/-3db), but some might like varying types of emphasis where the response is not neutral, something that Tyll mentioned above.

limpidglitch · Nov 10, 2013 at 1:32 AM

Have we left the question of does ABX testing reflect real world listening already?

If we leave ABX for a while and instead consider the whole gamut of double blind testing, wouldn't it be possible to perform tests that are deliberately made to simulate real world listening?
All it would take would be to cover up the various components with cloth, leave the room, and let someone else decide which components should be hooked up.
It would even be possible to stage a test such that the test subject would be blissfully unaware that he's being blind-tested. I don't have the sources at my fingertips as other here might, but I seem to remember reports from such tests. Amateur or professional I can't remember.

cel4145 · Nov 10, 2013 at 2:05 AM

limpidglitch said:
Have we left the question of does ABX testing reflect real world listening already?

If we leave ABX for a while and instead consider the whole gamut of double blind testing, wouldn't it be possible to perform tests that are deliberately made to simulate real world listening?
All it would take would be to cover up the various components with cloth, leave the room, and let someone else decide which components should be hooked up.
It would even be possible to stage a test such that the test subject would be blissfully unaware that he's being blind-tested. I don't have the sources at my fingertips as other here might, but I seem to remember reports from such tests. Amateur or professional I can't remember.

What led me to ask that question is the idea of palate cleansing. When you compare a mellow merlot with an equal quality bold cab sav that you might otherwise like the same, the cab sav will overwhelm the taste of the merlot if you don't cleanse your palate. This will bias you to prefer one over the other, because the cab tastes too strong or the merlot too weak.

I think the same thing can happen with speakers and headphones. I have Ascend Acoustics CBM-170 SEs crossed over with a good sub at 80hz that I listen to in a nearfield setup. The Ascends measure very neutral in anechoic measurements down to around 100hz:

Because I listen to them nearfield, I get a fairly neutral response out of them. When I listen to the Ascends for awhile and turn to my Grado SR225i, at first the Grados sound too colored (and a little bass shy)--definitely overly bright. But after ten minutes or so, I adjust to the colored frequency response and the particular emphasis they bring to certain types of music and their soundstage makes them sound wonderful.

In an ABX/DBT setting with short programs switching back and forth, I would imagine I would pick the CBM-170s with the more neutral response, as is consistent with what some of the studies Harman is doing predict. Like the wine example, I would expect that the more neutral aspect of the CBM-170s would bias me toward them. But in my general home listening where I take time to settle in with each and immerse myself in the music, I like both equally because they both offer something quite different and, indeed, have some characteristics that lend themselves more to one music genre over another. Whether I choose one over the other is less a superiority of one setup over the next in terms of my personal assessment of audio quality, but more dependent on my mood or what I would like to listen to. One day the aesthetic experience of one is better; another the next.

So this is my "real world listening" situation where I can imagine that ABX/DBT testing might fail to produce the same result. I know I'm comparing headphones to speakers, but the makes me ask if possibly there aren't situations where two sets of speakers might feel the same way. Or two sets of headphones?

xnor · Nov 10, 2013 at 2:17 AM

limpidglitch said:
If we leave ABX for a while and instead consider the whole gamut of double blind testing, wouldn't it be possible to perform tests that are deliberately made to simulate real world listening?
All it would take would be to cover up the various components with cloth, leave the room, and let someone else decide which components should be hooked up.
It would even be possible to stage a test such that the test subject would be blissfully unaware that he's being blind-tested. I don't have the sources at my fingertips as other here might, but I seem to remember reports from such tests. Amateur or professional I can't remember.

The one objection regularly made is that a typical ABX test is too short to get a "feeling" for the differences, but nothing prevents someone to do an ABX test over days, weeks, maybe even months in his home.

limpidglitch · Nov 10, 2013 at 3:12 AM

cel4145 said:
So this is my "real world listening" situation where I can imagine that ABX/DBT testing might fail to produce the same result. I know I'm comparing headphones to speakers, but the makes me ask if possibly there aren't situations where two sets of speakers might feel the same way. Or two sets of headphones?

For the sake of simplicity, lets consider two sets of speakers, where one is your Ascends and the other approaches a more Grado-like sound character.
It's obvious that they sound different. You can perform an ABX test just to establish this with rigour, but we'll assume it's passed with high confidence.
With this question out of the way we can start thinking about testing for preference, but you wouldn't use an ABX test for this. Ideally you'd devise a test procedure as conducive to your purpose as possible, making the results as true and relevant as possible. What methods you employ are up to you, but a few prerequisites should be met: A controlled environment to minimize confounding variables (blinding would be a part of this), consistency between repeated trials, and thorough documentation for reproducibility. This last point is of course not crucial if you're just looking for a personal preference, but it would be wise to keep it in mind.
How to design a test is often one of the most difficult parts of research, and an easy place to misstep. Often a well proven protocol is used, maybe with some modifications, but you are completely free to design as you wish. If you have a distinct way of listening, you'd accommodate your test to that distinction.
Often you'll have to do trade-offs. Especially if you are a big business like Harmon you can't devise a test to suit every customer. I can imagine they might have done some survey work before hand to map peoples listening habits, and then devised their tests so it would fit with the greatest number of people possible.
As such, the answer to your original question is that a test can reflect real world listening, but often, out of practical reasons, it won't completely, but hopefully sufficiently to be relevant.

stv014 · Nov 10, 2013 at 5:07 AM

cel4145 said:
So this is my "real world listening" situation where I can imagine that ABX/DBT testing might fail to produce the same result. I know I'm comparing headphones to speakers, but the makes me ask if possibly there aren't situations where two sets of speakers might feel the same way. Or two sets of headphones?

In your example, ABX would work perfectly for its intended purpose of determining if there is an audible difference. ABX only gives a "yes" or "no" answer (or, more precisely, a p-value that translates to statistical chances of "yes" and "no") to the question of "is the difference audible". To test the order of preference as well, you would want to use something like ABC/HR instead. Also, neither of these tests have any inherent limits on the duration of the listening (short duration is preferred for ABX simply because it is more effective for finding differences); you can perform one trial per day (with up to hours of listening) for weeks, and still get a statistically valid result.

xnor · Nov 10, 2013 at 7:09 AM

Anyone even reading my posts? Thought I'd mentioned ABC/HR already on page 1.

Btw, the keyword in testing shouldn't be ABX but blind.

cyclsbganes · Nov 10, 2013 at 9:13 AM

just do it, i like it very much,You can google that. It deals with the ability of people to hear subtle differences in samples separated by time. Human auditory memory lasts no more than a couple of seconds.thanks

Tyll Hertsens · Nov 10, 2013 at 9:51 AM

cel4145 said:
I wish I could read Toole's text, but I suspect it would over my head

His text is VERY readable for the layman. I think you'd be fine.

limpidglitch · Nov 10, 2013 at 11:08 AM

xnor said:
Anyone even reading my posts? Thought I'd mentioned ABC/HR already on page 1.

Btw, the keyword in testing shouldn't be ABX but blind.

I do. Your input is greatly appreciated.

cel appears to have replaced the narrow confines of ABX with blind testing in general for his query, as per post #36.

Latest Thread Images

Featured Sponsor Listings

Research that ABX/DBT testing reflects real world listening?

bigshot

Headphoneus Supremus

gregorio

Headphoneus Supremus

bigshot

Headphoneus Supremus

Copperears

100+ Head-Fier

bigshot

Headphoneus Supremus

cel4145

Headphoneus Supremus

limpidglitch

Headphoneus Supremus

cel4145

Headphoneus Supremus

xnor

Headphoneus Supremus

limpidglitch

Headphoneus Supremus

stv014

Headphoneus Supremus

xnor

Headphoneus Supremus

cyclsbganes

New Head-Fier

Tyll Hertsens

Garmentus Vulgaris & Headphoneus Supremus
Member of the Trade: Innerfidelity

limpidglitch

Headphoneus Supremus

Users who are viewing this thread

Latest Thread Images

Featured Sponsor Listings

Research that ABX/DBT testing reflects real world listening?

Headphoneus Supremus

Headphoneus Supremus

Headphoneus Supremus

100+ Head-Fier

Headphoneus Supremus

Headphoneus Supremus

Headphoneus Supremus

Headphoneus Supremus

Headphoneus Supremus

Headphoneus Supremus

Headphoneus Supremus

Headphoneus Supremus

New Head-Fier

Garmentus Vulgaris & Headphoneus SupremusMember of the Trade: Innerfidelity

Headphoneus Supremus

Users who are viewing this thread

Garmentus Vulgaris & Headphoneus Supremus
Member of the Trade: Innerfidelity