Head-Fi.org › Forums › Equipment Forums › Sound Science › Research that ABX/DBT testing reflects real world listening?
New Posts  All Forums:Forum Nav:

Research that ABX/DBT testing reflects real world listening? - Page 4

post #46 of 62
Thread Starter 
limpidglitch and stv014:

One way to test to see whether or not people do prefer flat frequency responses outside of artificial testing environments would be to draw on what Tyll described about the Virtual Headphone Listening Test Methodology study. If someone could create profiles by building upon APO Equalizer for popular headphones, we could do something similar to the Foobars ABX feature that could be used for DBT. The user could select from various profiles where the type of frequency response curve for the particular headphones was unnamed, and then let people do their own tests of extended duration. So say for each headphone model, the goal wouldn't be to emulate another headphone. Rather, there would be a flat profile, a v-shaped, a bass-emphasis only, a treble emphasis only, etc., and each headphone model would be EQd to similar models. The program could, at some point, record ratings of the profile before revealing the results to the listener. The ratings could be contributed automatically to a database online, much like the way CPU PassMark software works. Would provide some very interesting data.
Edited by cel4145 - 11/10/13 at 8:18am
post #47 of 62
Thread Starter 
Quote:
Originally Posted by Tyll Hertsens View Post

His text is VERY readable for the layman. I think you'd be fine.

Ok. You convinced me. I added it to my Amazon cart "Saved for later" reading list--which is quite big smily_headphones1.gif

(does anyone else use their Amazon cart that way?)
Quote:
Originally Posted by xnor View Post

Anyone even reading my posts? Thought I'd mentioned ABC/HR already on page 1.

Btw, the keyword in testing shouldn't be ABX but blind.

I'm trying to remember. I'll edit my main post and the thread title. Maybe that would help smily_headphones1.gif
post #48 of 62
Quote:
Originally Posted by cel4145 View Post

The loudspeaker test, though, you could compare how people perceived the speakers in that room, and then look at how the speakers were measured in that space at the listening position, then make your judgments based on those measurements of frequency response instead of the anechoic results.

But this would tell you nothing except the preference of the people taking part in the tests, in that particular room. If we assume those people have perceptions and preferences similar to our own and if we assume that similar rooms have a similar acoustic response and if we assume those speakers will perform similarly in a similar room then yes, a test like this could be useful. In practise though most these assumptions, particularly the assumption that rooms which appear similar have a similar acoustic response, are going to cause so much error in our results as to make them worthless. There are so many acoustic variables and changing any of them by a small amount can have very significant acoustic results: The exact dimensions of the room, how true are the walls, ceiling and floor, what materials are they made of, how thick are they, what furniture is in the room and where is it placed, what is the reflective/absorption characteristics of that furniture? Then there's the questions related to the speakers themselves; where are the speakers placed relative to the walls, floor, ceiling, what are the dispersion characteristics and frequency response of the speakers and what is our listening position relative to the speakers. In many cases, changing any of these variables by an inch or two could have a significant impact on what we hear at the listening position. In other words, two similar appearing rooms are likely in practise to have significantly different acoustic responses. And that's without taking into consideration the fact that different people's home listening environments don't even appear to be so similar in the first place!

 

Rather than having measurements which relate to a completely arbitrary acoustic response, it's much more useful to have a measurement of a speaker's own response and eliminate all the huge acoustic room variables from obscuring those measurements. Hence anechoic chamber measurements, which are a baseline response of the speaker without any individual room's acoustic colouration.

 

Quote:

Originally Posted by cel4145 View Post

Because I listen to them nearfield, I get a fairly neutral response out of them. When I listen to the Ascends for awhile and turn to my Grado SR225i, at first the Grados sound too colored (and a little bass shy)--definitely overly bright. But after ten minutes or so, I adjust to the colored frequency response and the particular emphasis they bring to certain types of music and their soundstage makes them sound wonderful.

In an ABX/DBT setting with short programs switching back and forth, I would imagine I would pick the CBM-170s with the more neutral response, as is consistent with what some of the studies Harman is doing predict. Like the wine example, I would expect that the more neutral aspect of the CBM-170s would bias me toward them.

 

As is so often the case in audio; assumptions are made, those assumptions are connected and supported with other information (which may or may not itself be correct) and all of a sudden we have a set of conclusions and expectations which appear entirely logical but are in fact entirely false. And just to make the whole thing worse still, those conclusions and expectations can heavily influence the results of any tests we carry out. This accounts for probably 90%+ of all the arguments/disagreements which occur on Head-Fi and many other audio forums. In this particular case you start out with the assumption; "because I listen to them nearfield, I get a fairly neutral response out of them", but, what makes you think that you are getting a neutral response from your nearfields? Are you listening to them in an anechoic chamber?

 

The point about ABX tests or indeed any valid testing, is that they eliminate the real world of assumption, expectation (and other biases) and shows us what is actually happening. Once we go beyond simple ABX, creating a test which actually tests what we are trying to test, by successfully eliminating everything else, becomes increasingly hard to achieve and therefore the results tend to become more unreliable.

 

G


Edited by gregorio - 11/13/13 at 3:57am
post #49 of 62
The main thing is, blind testing is a way of isolating things that can be known, scientifically, for the purpose of directing further research towards the productive improvement of audio reproduction. It's proving quite useful, if my purchase of more recent equipment is any indication. smily_headphones1.gif
post #50 of 62
Thread Starter 
Quote:
Originally Posted by gregorio View Post

But this would tell you nothing except the preference of the people taking part in the tests, in that particular room. If we assume those people have perceptions and preferences similar to our own and if we assume that similar rooms have a similar acoustic response and if we assume those speakers will perform similarly in a similar room then yes, a test like this could be useful. In practise though most these assumptions, particularly the assumption that rooms which appear similar have a similar acoustic response, are going to cause so much error in our results as to make them worthless. There are so many acoustic variables and changing any of them by a small amount can have very significant acoustic results: The exact dimensions of the room, how true are the walls, ceiling and floor, what materials are they made of, how thick are they, what furniture is in the room and where is it placed, what is the reflective/absorption characteristics of that furniture? Then there's the questions related to the speakers themselves; where are the speakers placed relative to the walls, floor, ceiling, what are the dispersion characteristics and frequency response of the speakers and what is our listening position relative to the speakers. In many cases, changing any of these variables by an inch or two could have a significant impact on what we hear at the listening position. In other words, two similar appearing rooms are likely in practise to have significantly different acoustic responses. And that's without taking into consideration the fact that different people's home listening environments don't even appear to be so similar in the first place!

Rather than having measurements which relate to a completely arbitrary acoustic response, it's much more useful to have a measurement of a speaker's own response and eliminate all the huge acoustic room variables from obscuring those measurements. Hence anechoic chamber measurements, which are a baseline response of the speaker without any individual room's acoustic colouration.

Sorry for the confusion. I was talking about the idea of how they are interested in seeing how people respond to frequency response, not pick the better speakers. While speakers in a different room will sound different because of the room acoustics, one can still test to see which type of measured response people like the best if the goal is to see whether people like flatter vs. more colored, although certainly room influence might cause all of the speakers to be so non-neutral that any test for that might be inconclusive for finding any correlation.
Quote:
Originally Posted by gregorio View Post

As is so often the case in audio; assumptions are made, those assumptions are connected and supported with other information (which may or may not itself be correct) and all of a sudden we have a set of conclusions and expectations which appear entirely logical but are in fact entirely false. And just to make the whole thing worse still, those conclusions and expectations can heavily influence the results of any tests we carry out. This accounts for probably 90%+ of all the arguments/disagreements which occur on Head-Fi and many other audio forums. In this particular case you start out with the assumption; "because I listen to them nearfield, I get a fairly neutral response out of them", but, what makes you think that you are getting a neutral response from your nearfields? Are you listening to them in an anechoic chamber?

The point about ABX tests or indeed any valid testing, is that they eliminate the real world of assumption, expectation (and other biases) and shows us what is actually happening. Once we go beyond simple ABX, creating a test which actually tests what we are trying to test, by successfully eliminating everything else, becomes increasingly hard to achieve and therefore the results tend to become more unreliable.

True about my assumptions. Although nearfield listening reduces the impact on (what's the term) 2nd order reflections because they are more reduced in SPL, as opposed to sitting in a living room far away from speakers where more types of reflections are likely to become more of a problem. So the odds are better that a speaker can sound more like its anechoic response nearfield.

Still, even most of the DBT testing itself is based on an assumption that short term comparisons indicate how people would respond long term listening to a speaker. I would argue that the assumption that short term comparisons are necessary because of the lack of reliability of audio memory is based on a cultural bias among audio science and engineering types that the lack of reliability is something to be removed from the speaker evaluation situation. It's possible that psychological adjustment listening to a speaker long term makes some of the finer distinctions about neutral frequency response moot when it comes to how pleasurable the listening experience is when comparing one speaker to another. But perhaps this has already been tested and ruled out (I'm not familiar with the literature).

So you can't escape assumptions and bias smily_headphones1.gif
post #51 of 62
Quote:
Originally Posted by cel4145 View Post

Sorry for the confusion. I was talking about the idea of how they are interested in seeing how people respond to frequency response, not pick the better speakers. While speakers in a different room will sound different because of the room acoustics, one can still test to see which type of measured response people like the best if the goal is to see whether people like flatter vs. more colored, although certainly room influence might cause all of the speakers to be so non-neutral that any test for that might be inconclusive for finding any correlation.

The problem is that there are just so many variables. It's almost impossible to get a flat frequency response from speakers, as I've mentioned. Then of course there's the fact that frequency response is only one of the factors which affects perception. For example we have the curious effect that 85dBSPL at the listening position in a large room, like a cinema, sounds almost half as loud as 85dBSPL at the listening position in a small room, like a sitting room. This appears due to the timing relationship of the initial early reflections to the direct sound, rather than to any frequency response issues. We've got problems with room acoustics affecting the duration of some frequencies. In other words, it can measure flat on a spectrum analyser but still have significant frequency imbalances, which can only be measured with a waterfall graph. We've also got different preferences as we get older, due to the ageing of the ears' physiology and changing perception of volume and relative balance. You've probably noticed that older people are far less tolerant of loud music. It's not because they are old, boring and out of touch, it's because their ears cannot physically handle high volumes and it causes discomfort/pain! And, putting aside all these acoustic and physiology issues we got the fact that there is no standard frequency response of any recording we are listening to. The perceived frequency response of each recording is created according to the personal tastes of the individual producer and mastering engineer and to further complicate matters every genre has it's own individual rough guidelines for frequency response, for example, EDM would be expected to have a great deal more perceived bass content than say acoustic jazz. So how do we decide what is a flat response from a recording? Maybe we can avoid this whole issue by avoiding different recordings and different biases towards different genres and just test with say pink noise. But then as far as creating a test for preference is concerned we'd essentially be asking "do you like the reproduction of pink noise better on this speaker than on that one or with this or that colouration"? I'm not sure if that is going to tell us anything other than the fact that almost no one likes listening to pink noise!

 

Quote:
Originally Posted by cel4145 View Post

Although nearfield listening reduces the impact on (what's the term) 2nd order reflections because they are more reduced in SPL, as opposed to sitting in a living room far away from speakers where more types of reflections are likely to become more of a problem. So the odds are better that a speaker can sound more like its anechoic response nearfield.

Well, that's essentially the theory. Sitting much closer to the direct sound source will make the indirect sound (reflections) appear much quieter and therefore reduce the impact of the room's acoustics. All well and good in theory but as with so much in the audio world, we can't just take an isolated theory/fact and apply it regardless, there are virtually always conditions attached. We can't, for example, just say that cables don't make any perceivable difference, we have to attach certain conditions to this statement in order for it to be true, such as basic standards of cable construction and the right type/gauge of cable for the job. Same with nearfield speakers but unfortunately, the conditions of nearfield monitoring to fulfil the theory are frequently not met. Many people put nearfield monitors on a desk and that desk is often pushed against a wall. By the time your ears are a couple of feet or more away from the nearfields, much/most of what you are hearing is reflections from the desk and back wall. We are certain to get some significant phase cancellations from the interaction of these reflections, phase cancellations which we wouldn't get from ordinary (not nearfield) speakers placed on appropriate speaker stands, a little way from the wall behind them. Although of course you'd get other reflections and phase interactions affecting the frequency response. I've measured phase cancellations at the listening position in certain frequencies from nearfields placed on desks which are over 30dB and the professionals using them hadn't realised and were totally shocked at the measurement results! If you were to place nearfields on appropriate stands, away from the wall, without a desk (or other highly reflective surface) between you and the speakers and obviously sitting close to them (nearfield is generally considered to be around 3-4ft) then yes, you will usually get a much flatter response than sitting further away from mid-field speakers in an untreated room. It's still unlikely to be anywhere near as flat as it's anechoic response though, without some acoustic treatment.

 

G

post #52 of 62

There should be a setting called "flat enough for government work".

post #53 of 62
Quote:
Originally Posted by cel4145 View Post

Still, even most of the DBT testing itself is based on an assumption that short term comparisons indicate how people would respond long term listening to a speaker. I would argue that the assumption that short term comparisons are necessary because of the lack of reliability of audio memory is based on a cultural bias among audio science and engineering types that the lack of reliability is something to be removed from the speaker evaluation situation. It's possible that psychological adjustment listening to a speaker long term makes some of the finer distinctions about neutral frequency response moot when it comes to how pleasurable the listening experience is when comparing one speaker to another. But perhaps this has already been tested and ruled out (I'm not familiar with the literature).

DBT doesn't make such assumption.

As has been pointed out before, ABX is primarily designed to figure out if the person under test can hear a difference. Whether A or B sounds better is irrelevant but usually easy to figure out (thinking of lossy codecs here) if there is an audible difference to begin with.

I also think I already wrote that you can do a blind test over weeks if you want, there is no upper time limit.

 

For detecting statistically significant preference ABC/HR is more fitting.


Edited by xnor - 11/14/13 at 11:20am
post #54 of 62

Generally, ABX is most useful in determining if two sounds that are close to being indistinguishable are actually different. For real world application in putting together a home sound system to listen to music on, it isn't very useful. If the difference is so small you need to do an ABX test to find out if it even exists, odds are it really doesn't matter at all. For me, direct line level matched A/B comparison is much more useful. I'm not splitting atoms. I'm listening to music.

post #55 of 62
Quote:
Originally Posted by bigshot View Post
 

I'm not splitting atoms.

I leave that to nuclear reactors as well.

 

But seriously, when you want to show someone who is convinced expensive component A sounds different (better) than cheap component B that it doesn't or that the difference is a lot smaller than asserted you have to have a bullet-proof test, else you will always get biased results.


Edited by xnor - 11/14/13 at 11:36am
post #56 of 62
Quote:
Originally Posted by xnor View Post
.......................................

As has been pointed out before, ABX is primarily designed to figure out if the person under test can hear a difference. Whether A or B sounds better is irrelevant but usually easy to figure out (thinking of lossy codecs here) if there is an audible difference to begin with.

............................................

I have to disagree on this point.  One complaint (by pros that use ABX tests) is that the test is way to sensitive.  A skilled listener can identify extremely small differences, way to small for them to say why or how they are different.

post #57 of 62
Quote:
Originally Posted by xnor View Post
 

But seriously, when you want to show someone who is convinced expensive component A sounds different (better) than cheap component B that it doesn't or that the difference is a lot smaller than asserted you have to have a bullet-proof test, else you will always get biased results.

 

From my experience around here, the only result you get is more quibbling that the test isn't bullet proof enough. Some folks just can't handle the truth! (to quote a movie)

post #58 of 62

Some can change their mind, that's what counts. ;) 

post #59 of 62

Diogenes! Holding up his lamp and looking for the last honest man!

post #60 of 62

Happy 31st post, bigshot!

New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Sound Science
Head-Fi.org › Forums › Equipment Forums › Sound Science › Research that ABX/DBT testing reflects real world listening?