An interesting take on DBT
Apr 18, 2009 at 9:06 PM Post #31 of 45
Quote:

Originally Posted by scompton /img/forum/go_quote.gif
IMO that you can hear a sound is no difference than can you feel something, or hear a difference between 2 samples or feel a difference. If you think the senses of hearing and sight are subjective and imagined, so is the feeling of touch.


Of course - any sense mechanism is experienced subjectively.

Quote:

Originally Posted by scompton /img/forum/go_quote.gif
The ABX tests I took were to determine if I could hear a difference between different bit rates. I've yet to get a positive result from any test I've taken. Even if I would have gotten a positive result in a test I've taken in the past, it would have been a false positive since I've never heard a difference and I've always guessed. I'm not measuring qualitative differences but quantitative differences. I'm not trying to determine which I like better but can I hear a difference.


Right, but say you have a FLAC and an mp3 in a sighted test, and you imagine an increase in, say, bass due to placebo effect or expectation bias. My question is: in what way does this imagined increase in bass qualitatively differ from a real increase in bass? I don't accept the postulation that the two are, indeed, equivalent, but it does pose a philosophical problem that I find pretty challenging.

Quote:

Originally Posted by scompton /img/forum/go_quote.gif
BTW, you're wrong about colors. There are objective measurements of colors based on wave lengths of light. An object has color based on on reflection and absorption of different wave lengths of light.


An object doesn't have color based on reflection and absorption of different wavelengths of light - my dog will never know what color shirt I'm wearing because dogs can't see in color. The frequency absorption is still there, but the color isn't, because the color is created in our brains as part of our perceptual faculties.

The same concept applies to color blind people - to someone with protanopia, there's no such thing as a red object. The concept simply does not apply to them. You could argue that the object is, in "reality," red, but that doesn't get you very far, because I could just as easily argue that the "real" color of the object isn't in fact red but some other color that only bees (being able to see ultraviolet wavelengths) can see, and we're "UV colorblind."

Quote:

Originally Posted by scompton /img/forum/go_quote.gif
Perception of color is also measurable and has to do with the interaction of different wave lengths with cone cells in your retina. So, perception of a color is quantitative, whether or not you like a color is qualitative.


Color - Wikipedia, the free encyclopedia

"Because perception of color stems from the varying sensitivity of different types of cone cells in the retina to different parts of the spectrum, colors may be defined and quantified by the degree to which they stimulate these cells. These physical or physiological quantifications of color, however, do not fully explain the psychophysical perception of color appearance."
 
Apr 19, 2009 at 3:34 PM Post #32 of 45
Quote:

Originally Posted by scompton /img/forum/go_quote.gif
IMO that you can hear a sound is no difference than can you feel something, or hear a difference between 2 samples or feel a difference.


Essentially what you're asserting here is a 1:1 relationship between stimuli and your conscious awareness of stimuli. That viewpoint has been thoroughly refuted over decades of research, which has shown, in summary form - that there are many forms of stimuli that affect humans without their awareness of it.

Quote:

Originally Posted by scompton /img/forum/go_quote.gif
BTW, you're wrong about colors. There are objective measurements of colors based on wave lengths of light. An object has color based on on reflection and absorption of different wave lengths of light. Perception of color is also measurable and has to do with the interaction of different wave lengths with cone cells in your retina. So, perception of a color is quantitative, whether or not you like a color is qualitative.


Objective measurements of "color" are never made - only objective measurements of light, electrochemical reactions or other neural correlates to optical or visual processing which have physical correlations with the observation of colour.

This is made perfectly apparent to anyone with any imagination who can close their eyes and still perceive colours even though no light is hitting their eyes.
 
Apr 19, 2009 at 4:31 PM Post #33 of 45
Quote:

Originally Posted by Hirsch /img/forum/go_quote.gif
A positive control in pharmacology might be an already approved drug that has a similar effect to the one that you're testing. You'd be testing both a known drug and a new compound trying to become a drug against a known inactive compound. This is not commonly done, as there really isn't an appropriate positive control available for the majority of drug trials.


Thanks for the informative reply, Hirsch, the use of the positive controls as you describe makes sense to me. As for the positive (or active) control in drug trials, I think it is becoming a bit more common due to certain ethical concerns and the increasing amount of drugs available (and pharma's decision to focus on tweaks rather than new, but that's a different story). Of course, FDA prefers a placebo control. As I recall, using an active control in a drug trial (for approval) is referred to as a non-inferiority analysis. I always liked that terminology.
wink.gif
But I am wandering way off...

I guess I also come back to sample size and, importantly, I think, the sample population. To the extent that rigorous audio DBTs have been performed, has the sample population been audiophiles or all comers?
 
Apr 19, 2009 at 5:39 PM Post #34 of 45
Quote:

Originally Posted by royalcrown /img/forum/go_quote.gif
Right, but say you have a FLAC and an mp3 in a sighted test, and you imagine an increase in, say, bass due to placebo effect or expectation bias. My question is: in what way does this imagined increase in bass qualitatively differ from a real increase in bass? I don't accept the postulation that the two are, indeed, equivalent, but it does pose a philosophical problem that I find pretty challenging.


What if you believe you are performing a sighted test, but are actually being deceived? Is the imagined quality increase in the MP3 (which you believe to be a FLAC) any different from a real increase in quality?

Quote:

An object doesn't have color based on reflection and absorption of different wavelengths of light - my dog will never know what color shirt I'm wearing because dogs can't see in color. The frequency absorption is still there, but the color isn't, because the color is created in our brains as part of our perceptual faculties.

The same concept applies to color blind people - to someone with protanopia, there's no such thing as a red object. The concept simply does not apply to them. You could argue that the object is, in "reality," red, but that doesn't get you very far, because I could just as easily argue that the "real" color of the object isn't in fact red but some other color that only bees (being able to see ultraviolet wavelengths) can see, and we're "UV colorblind."



Color - Wikipedia, the free encyclopedia

"Because perception of color stems from the varying sensitivity of different types of cone cells in the retina to different parts of the spectrum, colors may be defined and quantified by the degree to which they stimulate these cells. These physical or physiological quantifications of color, however, do not fully explain the psychophysical perception of color appearance."


Your color argument is about how unique individuals can differ in their perception of the same object. That doesn't address audio DBT though, which is about how each individual perceives the differences between 2 objects.
 
Apr 19, 2009 at 5:57 PM Post #35 of 45
Quote:

Originally Posted by deaconblues /img/forum/go_quote.gif
Your color argument is about how unique individuals can differ in their perception of the same object. That doesn't address audio DBT though, which is about how each individual perceives the differences between 2 objects.


Isn't DBT testing in audio about whether an individual perceives a difference at all, not how differences are perceived?
 
Apr 19, 2009 at 6:09 PM Post #36 of 45
Quote:

Originally Posted by deaconblues /img/forum/go_quote.gif
What if you believe you are performing a sighted test, but are actually being deceived? Is the imagined quality increase in the MP3 (which you believe to be a FLAC) any different from a real increase in quality?


Is it? I always thought the answer was obvious, but as I thought about it as a philosophical issue it's a pretty difficult question. Putting aside the intuition that the self-evident answer is "yes," is there a reason why the two are any different? It would seem to me, at least from armchair philosophizing, that the answer is "no" - and if it is indeed "no," what does that say about the DBT methodology? I don't know the answer to that question, but it's an interesting topic for discussion.


Quote:

Originally Posted by deaconblues /img/forum/go_quote.gif
Your color argument is about how unique individuals can differ in their perception of the same object. That doesn't address audio DBT though, which is about how each individual perceives the differences between 2 objects.


That was a reply to scompton's post - I was simply establishing there that color can't be objectively measured, which is what scompton was pushing for.
 
Apr 20, 2009 at 10:20 PM Post #37 of 45
Quote:

Originally Posted by Hirsch /img/forum/go_quote.gif
Rather than communication, a still more accurate word would be interpretation... By running the test using DBT, you eliminate that alternate hypothesis, and your interpretation of your results as something that you actually heard is likely to be more accurate. This is DBT working the way that it should.


True, but I don't quite understand applying the appellation "interpretation" to a protocol.

Quote:

Again, why guess? If you’ve got known differences built into your test (positive controls), you can look at your data and tell just how many people are “golden ears” that can really distinguish known auditory differences, with real data to back it up. If those people who hear the known differences reliably can’t hear a difference between the test stimuli, it’s much more meaningful than testing a group of people who claim to hear differences, but whose actual auditory discriminatory capability is an unknown.


Is it? Or, at least, is it meaningful to everybody? If we had a ton of golden ears fail a DBT of something like interconnects or high res, even in the presence of some quite insane positive control (like 1 degree changes in speaker position, or 0.1db Q=1 frequency response deviations, or what have you) - what exactly does it mean if we only establish inferiority of effect relative to that positive control?

I think such a result would be of great use to skeptical designers/manufacturers/listeners who wish to prioritize their investments - from a statistical point of view, it's clearly very plausible (and very useful) to establish a ranking of level of audibility for these sorts of things. But aren't the sorts of debates that bring up these DBT questions in the first place already trading at the limits of audibility to begin with? People can yammer all they want about how highres/interconnect changes/etc are such drastic night and day differences, but when you actually get down to testing them, well, the effect is so subtle that measurements may not pick it up, you need a sufficiently high quality system to tell the difference, etc.

Any positive control applied against such an effect in the DBT will do nothing more than place constraints on the scale of the effect that can (and will) be freely dismissed by those advocating the existence of the effect, on the grounds that the positive control was far too audible to adequately constrain the effect. And of course, because people tend to overestimate how well they can hear even simple-sounding stuff, like +1db Q=1 eq changes or 1% THD, I think that's going to be a really hard charge to dismiss.

I get the extremely strong notion that any serious attempt at a positive control is just going to provide more fodder for people whose minds are already made up, in the positive, for these sorts of audibility anyway. We already have Robert Harley of TAS lambasting Meyer/Moran for the fundamental reason that the results run so utterly and blatantly contradictory to (his) experience. I'm not disputing that positive controls are fundamentally a good idea - they are - just not for the problem we are talking about.

I guess I'm looking at this debate too single-mindedly from the eyes of a non-statistician. I'm not really interested in making persuasive (and accurate!) statements about audio quality from a position like this, because I just don't think that's possible. The questions about relating results to positive controls need to be answered, but I don't think it will do a thing to truly resolve these issues.

Quote:

Incidentally, what I am saying is that a threshold shift in blind testing is a real possibility. Yet another reason to insure that people in a blind testing situation can make normal sensory discriminations (back to those positive control groups I keep harping about).


Well, you assert it is a real possibility, using analogous proven results in other sensory perceptions, but the magnitude and detail of this threshold shift is entirely up in the air. How well quantified are sighted forms of bias, in comparison to this shift? Depending on the nature of the shift, it might not be a significant bias in a DBT. Again, I'm not disputing that these things can exist - but until compelling evidence is provided, I don't think it's logical to consider this bias to be universally important in all audio DBTs. Just like it's not logical to argue that price-induced observer bias is alone a compelling reason to discount sighted tests - of course it's there, but sometimes it's not significant or runs contrary to expectations, and the larger issues are more important.

Quote:

Of course your hypothesis is a valid experimental hypothesis. However, you’re moving far from the original question. That is, the hypothesis you’re testing now is “Can people hear what they think they can”? So, the experiment is moving away from perceived differences in gear, encoding, or any other feature of the auditory stimulus, and moving into the realm of psychoacoustics. It’s an interesting question, but now you don’t even need unknown stimuli (the gear), because unknown stimuli are simply going to introduce random variance that can mask effects of interest. You’re going to want to use stimuli with known and measured characteristics, because you’re testing the relationship between what is expected and what is actually heard. To do that, you’re going to manipulate the differences between the stimuli, and see if the subject can track the known differences.


The hypothesis is more like "Group G of audiophiles literally hears distortion effect X." I think the hypothesis is intrinsically tied to that effect X, and changing it - especially to couple the hypothesis to threshold tests that may be of a substantially higher magnitude of distortion - compromises the meaning of the results versus the questions surrounding effect X. That said, I do think your method would be much superior than ABX in finding exact threshold levels.

I agree that I'm changing the hypothesis a bit - at least, from how it is commonly stated. But I don't see my hypothesis ("group of audiophiles claim they hear effect but don't") as being substantially less meaningful than the "usual" hypothesis ("nobody can hear effect"). If one can prove that no more than a certain percentage of a test population has a certain property, but the entire test population believes they have that property - assuming the test population is evenly distributed - doesn't that that puts a damper on anybody else deciding they have that property? (without additional, and rightful, justification?)

I think this is the crux of the type II discussion here, and of my complaints about testing of the individual vs testing of the group. I agree that calculating pd is more or less a shot in a dark room, but said room is certainly not pitch black. To argue that pd should be low, for an effect that is asserted to be obvious and/or significant, presumes either some test sensitivity issue; or some sort of exceptionalism on the part of the test population. While you do bring up the possibility of the former, I simply cannot believe the latter to be true for most audiophile blind tests unless specific plausible proof is cited.

I guess what I'm saying is that I don't see how this sort of beta analysis - even post-hoc pd-based beta analysis - can be called "guesswork". In fact, given the problems I think are always going to exist in the interpretation of positive controls with this sort of testing, I daresay that post-hoc beta analysis is equally valid, if not superior. ABX testing is pretty straightforward to evaluate on a trial by trial basis. The listener didn't hear a difference and got it 50% right, or the listener got it right - and these probabilities match the meaning of pd exactly; the literal interpretation of the three possible trial states match the exact meaning of pd. In that context: what is so wrong about running the results through different values of pd and interpreting the meaning of the results? The value of pd - for an ABX test - has clear, well-defined, and comprehensive meaning that encapsulates all the potential negative biases of the test.

Let's say that beta<0.05 on some results can be achieved with pd=0.001 (and therefore will yield even lower values of beta for larger pd). This requires a success fraction very close to 0.5 (with many many trials) and I think would represent a substantial statistical result. But what if one objects and claims that pd was in fact below 0.001? It's certainly possible, and if so would increase beta beyond the 0.05 level. But is it plausible? For many issues - but of course not all issues - I think that one can confidently state that such an assertion of a low pd is simply not what exists in reality, based on knowledge of the test population. I'd call this sort of line of reasoning post-hoc, but I don't really see the issue.

Quote:

This was real data, and you’re absolutely correct that issues with type II error are going to be horrific. However, in a scientific setting, the data is the data. You can’t ignore it because it poses methodological issues that are horrific. You’ve got to deal with the issues instead. This is not just true in audio, but all areas of science where blind testing is used. I am aware of at least one proposed clinical trial that the FDA killed because the expected incidence of a possible side-effect was so low that even a large-scale study was not likely to have sufficient power for the FDA to make a decision based on the results.


A very interesting example, but wouldn't such a situation invalidate all testing, then, not just blind testing? I mean, you'd be down to case studies....

I guess that's your point: that some stuff is just unknowable?
 
Apr 21, 2009 at 8:42 PM Post #38 of 45
Quote:

Originally Posted by Publius /img/forum/go_quote.gif
True, but I don't quite understand applying the appellation "interpretation" to a protocol.


A brief overview of the scientific process might be useful. While there are many variations, this is probably a decent summary of key steps. First, you create a hypothesis about the world. From that hypothesis, which we can call the scientific hypothesis, you then design an experiment to test it. You pick out your experimental design, dependent and independent variables, control groups and experimental conditions, and all the rest. From this, you create a protocol, which is a document that defines how the experiment is run. Running a protocol generates data. The data may be qualitative or quantitative. Either way, in some way it must be summarized. This is normally done using descriptive statistics. You then need to determine the probability that the data generated by the sample you have run can be generalized to the population from which the sample was drawn. You do this by attempting to disprove an experimental hypothesis, which is a mathematical construct based on your experimental design. This is inferential statistics, and is where alpha and beta apply. If you're only interested in testing for yourself (population=1), these statistics have little meaning, as the descriptive statistcs are applicable to the entire population. However, the final step is to determine exactly what you have proven in regards to the original scientific hypothesis. That is, do the data that you have generated mean anything? This is where you have to show that you have eliminated alternate hypotheses that might explain your data, through careful consideration in the experimental design. A good design might provide strong support for your original hypothesis, while a weak design might well be dismissed as having little explanatory power. So, the role of DBT/ABX is to eliminate expectancy as a possible alternative explanation to auditory process in interpreting the meaning of your data. However, it's also important to realize that DBT/ABX generate a different set of alternate hypotheses that can explain obtained results. Tight experimental design can eliminate some of the alternate hypotheses, which makes interpretation of the data more meaningful.


Quote:

A very interesting example, but wouldn't such a situation invalidate all testing, then, not just blind testing? I mean, you'd be down to case studies....

I guess that's your point: that some stuff is just unknowable?


Actually, my point is that you have to get creative in experimental design when conventional designs don't address the problem in question. A low incidence of an effect is certainly addressable scientifically, but this is where a good experimenter earns his keep. An experimental design is simply a tool for testing a hypothesis. And, as Mr. Natural said to Flakey Foont (extra credit for anyone who knows where that came from):

"Use the right tool for the job!"
 
May 2, 2009 at 1:41 AM Post #39 of 45
I haven't crafted a response to your last post Hirsch, but after talking the threshold thing over on HydrogenAudio looking for more substantial references on the phenomenon, jj provided a pretty interesting counterargument.

Quote:

You are, I think, aware that the noise due to the discrete nature of the atmosphere is about 6dB SPL, give or take, white noise, at the eardrum. Given the primary integration time of the ear, which is at best 200 milliseconds, and the bandwidth of each critical band, we can very quickly discover the detection limits (mathematically). Guess what? Both Fletcher and Stevens come within ->||<- of it for signals around ear canal resonance.

We're talking about a place where we show that DBT's do not elevate thresholds. With that, we can show that at least one kind of listening does not suffer threshold impariment in a bind test.


If blind testing causes any kind of threshold shift in audibility limits, it is physically bounded by the difference between the audibility limits of human hearing, and the noise floor of the atmosphere. Which, as jj describes, is very small. Otherwise it must apply to some thresholds but not others - but I would call that ad-hockery unless there was some more substantial evidence to back it up.

I've tried to find some references for this threshold shift claim and came up empty. Your turn, Hirsch?
wink.gif
 
May 2, 2009 at 5:56 AM Post #43 of 45
Quote:

Originally Posted by The Monkey /img/forum/go_quote.gif
Perhaps you both should take it somewhere else then.


How could I not with the super nice way you requested we save it for the peanut gallery?
rolleyes.gif
 
May 2, 2009 at 8:02 AM Post #44 of 45
Ha Ha - oh man, what a laugh - I gotta headache from it! Ok I'm going.

In light of your very reasonable opinion - I concur with your request and heed your advice to take my posterior elsewhere. **** it I say - regulators! mount up.
L3000.gif
 

Users who are viewing this thread

Back
Top