The unequal variance (Welch) t test
Two unpaired t tests
When you choose to compare the means of two nonpaired groups with a t test, you have two choices:
- Use the standard unpaired t test. It assumes that both groups of data are sampled from Gaussian populations with the same standard deviation.
- Use the unequal variance t test, also called the Welch t test. It assues that both groups of data are sampled from Gaussian populations, but does not assume those two populations have the same standard deviation.
These choices are offered by GraphPad InStat, GraphPad Prism, the GraphPad free web t test QuickCalc, as well as many other programs.
The usefulness of the unequal variance t test
To interpret any P value, it is essential that the null hypothesis be carefully defined. For the unequal variance t test, the null hypothesis is that the two population means are the same but the two population variances may differ. If the P value is large, you don't reject that null hypothesis, so conclude that the evidence does not persuade you that the two population means are different, even though you assume the two populations have (or may have) different standard deviations. What a strange set of assumptions. What would it mean for two populations to have the same mean but different standard deviations? Why would you want to test for that? Swailowsky points out that this situation simply doesn't often come up in science (1).
I think the unequal variance t test is more useful when you think about it as a way to create a confidence interval. Your prime goal is not to ask whether two populations differ, but to quantify how far apart the two means are. The unequal variance t test reports a confidence interval for the difference between two means that is usable even if the standard deviations differ.
How the unequal variance t test is computed
Both t tests report both a P value and confidence interval. The calculations differ in two ways:
- Calculation of the standard error of the difference between means. The t ratio is computed by dividing the difference between the two sample means by the standard error of the difference between the two means. This standard error is computed from the two standard deviations and sample sizes. When the two groups have the same sample size, the standard error is identical for the two t tests. But when the two groups have different sample sizes, the t ratio for the Welch t test is different than for the ordinary t test. This standard error of the difference is also used to compute the confidence interval for the difference between the two means.
- Calculation of the df. For the ordinary unpaired t test, df is computed as the total sample size (both groups) minus two. The df for the unequal variance t test is computed by a complicated formula that takes into account the discrepancy between the two standard deviations. If the two samples have identical standard deviations, the df for the Welch t test will be identical to the df for the standard t test. In most cases, however, the two standard deviations are not identical and the df for the Welch t test is smaller than it would be for the unpaired t test. The calculation usually leads to a df value that is not an integer. InStat, Prism, and our QuickCalc all round the df down to next lower integer, as is common. Future versions will use the fractional df, which is more accurate.
When to chose the unequal variance (Welch) t test
Deciding when to use the unequal variance t test is not straightforward.
It seems sensible to first test whether the variances are different, and then choose the ordinary or Welch t test accordingly. In fact, this is not a good plan. You should decide to use this test as part of the experimental planning.
References
1. S.S. Sawilowsky. Fermat, Schubert, Einstein, and Behrens-Fisher: The Probable Difference Between Two Means With Different Variances. J. Modern Applied Statistical Methods (2002) vol. 1 pp. 461-472