BrownMath.com → Statistics → 1-Way ANOVA
Updated 18 Nov 2021

# Comparing More Than Two Means:One-Way ANOVA

Summary:

When you have several means to compare, it’s not valid just to compare all possible pairs with t tests. Instead, you follow a two-stage process:

1. Are all the means equal? A computation called ANOVA (analysis of variance) answers this question.
2. If ANOVA shows that the means aren’t all equal, then which means are unequal, and by how much? There are many ways to answer this question (and they give different answers), but we’ll use a process called Tukey’s HSD (Honestly Significant Difference).

## Terminology

The factor that varies between samples is called the factor. (Every once in a while things are easy.) The r different values or levels of the factor are called the treatments. Here the factor is the choice of fat and the treatments are the four fats, so r = 4.

The computations to test the means for equality are called a 1-way ANOVA or 1-factor ANOVA.

## Example 1: Fat for Frying Donuts

 g Fat Absorbed in Batch x̅ s Fat 1 Fat 2 Fat 3 Fat 4 64 72 68 77 56 95 72 13.34 78 91 97 82 85 77 85 7.77 75 93 78 71 63 76 76 9.88 55 66 49 64 70 68 62 8.22 source: Snedecor 1989 [full citation in “References”, below] pp 217–218

Hoping to produce a donut that could be marketed to health-conscious consumers, a company tried four different fats to see which one was least absorbed by the donuts during the deep frying process. Each fat was used for six batches of two dozen donuts each, and the table shows the grams of fat absorbed by each batch of donuts.

It looks like donuts absorb the most of Fat 2 and the least of Fat 4, with intermediate amounts of Fat 1 and Fat 3. But there’s a lot of overlap, too: for instance, even though the mean for Fat 2 is much higher than for Fat 1, one sample of Fat 1, 95 g, is higher than five of the six samples of Fat 2.

Nevertheless, the sample means do look different. But what about the population means? In other words, would the four fats be absorbed in different if you made a whole lot of batches of donuts — do statistics justify choosing one fat over another? This is the basic question of a hypothesis test or significance test: is the difference great enough that you can rule out chance?

If Fats 2 and 4 were the only ones you had data for, you’d do a good old 2-sample t test. So why can’t you do that anyway? because that would greatly increase your chances of a Type I error. The reasons are given in the Appendix.

By the way, though usually you are interested in the differences between population means with various treatments, you can also estimate the individual means. If you’re interested, see Estimating Individual Treatment Means in the Appendix.

## Step 1: ANOVA Test for Equality of All Means

The ANOVA procedure tests these hypotheses:

H0: μ1 = μ2 = ... = μr, all the means are the same

H1: two or more means are different from the others

Let’s test these hypotheses at the α = 0.05 significance level.

You might wonder why you do analysis of variance to test means, but this actually makes sense. The question, remember, is whether the observed difference in means is too large to be the result of random selection. How do you decide whether the difference is too large? You look at the absolute difference of means between treatments (samples), but you also consider the variability within each treatment. Intuitively, if the difference between treatments is a lot bigger than the difference within treatments, you conclude that it’s not due to random chance and there is a real effect.

And this is just how ANOVA works: comparing the variation between groups to the variation within groups. Hence, analysis of variance.

### Requirements for ANOVA

1. You need r simple random samples for the r treatments, and they need to be independent samples. The sample sizes need not be the same, though it’s best if they’re not very different.
2. The underlying populations should be normally distributed. However, the ANOVA test is robust and moderate departures from normality aren’t a problem, especially if sample sizes are large and equal or nearly equal (Kuzma & Bohnenblust 2005 [full citation at https://BrownMath.com/swt/sources.htm#so_Kuzma2005] page 180).
3. The samples should all have the same standard deviation, theoretically. Because the ANOVA test is robust, Sullivan 2011 [full citation at https://BrownMath.com/swt/sources.htm#so_Sullivan2011] page C–21 (on CD) says it’s good enough if the largest standard deviation is less than double the smallest standard deviation.

Miller 1986 [full citation in “References”, below] (pages 90–91) is more cautious. When sample sizes are equal but standard deviations are not, the actual p-value will be slightly larger than what you find in the tables. But when sample sizes are unequal, and the smaller samples have the larger standard deviations, the actual p-value “can increase dramatically above” what the tables say, even “without too much disparity” in the standard deviations. “Falsely reporting significant results when the small samples have the larger variances is a serious worry. The lesson to be learned is to balance the experiment [equal sample sizes] if at all possible.

### Perform a 1-Way ANOVA Test

A 1-way ANOVA tests whether the means of all groups are equal for different levels of one factor, using some fairly lengthy calculations. You could do all the computations by hand as shown in the Appendix, but no one ever does. Here are some alternatives:

• Excel’s command is in the  » menu in Excel 2003 and below, or the  »  » menu in Excel 2007. If you don’t see it there, follow instructions in Excel help to load the Analysis Toolpak.
• On a TI-83 or TI-84, enter each sample in a statistics list, then press [`STAT`] [`◄`] [`▲`] to select `ANOVA`, and enter the list names separated by commas.
• There are even Web-based ANOVA calculators, such as Lowry 2001b [full citation in “References”, below].
• There are many software packages for mathematics and statistics that include ANOVA calculations. One of them, R, is highly regarded and is open source.

When you use a calculator or computer program to do ANOVA, you get an ANOVA table that looks something like this:

SS df MS F p 1636.5 3 545.4 5.41 0.0069 2018.0 20 100.9 3654.5 23

Note that the mean square between treatments, 545.4, is much larger than the mean square within treatments, 100.9. That ratio, between-groups mean square over within-groups mean square, is called an F statistic (F = MSB/MSW = 5.41 in this example). It tells you how much more variability there is between treatment groups than within treatment groups. The larger that ratio, the more confident you feel in rejecting the null hypothesis, which was that all means are equal and there is no treatment effect.

But what you care about is the p-value of 0.0069, obtained from the F distribution. The p-value has the usual interpretation: the probability of the between-treatments MS being ≥5.41 times the within-treatments MS, if the null hypothesis is true, is p = 0.0069.

The p-value is below your significance level of 0.05: it would be quite unlikely to have MSB/MSW this large if there were no real difference among the means. Therefore you reject H0 and accept H1, concluding that the mean absorption of all the fats is not the same.

An interesting extra parameter can be derived from the ANOVA table; see η˛: Strength of Association in the Appendix below.

Now that you know that it does make a difference which fat is used, you naturally want to know which fats are significantly different. This is post-hoc analysis. There are several different post-hoc analyses, and no one is superior on all points, but the most common choice is the Tukey HSD.

## Step 2: Tukey HSD for Post-Hoc Analysis

If your ANOVA test shows that the means aren’t all equal, your next step is to determine which means are different, to your level of significance. You can’t just perform a series of t tests, because that would greatly increase your likelihood of a Type I error. So what do you do?

John Tukey gave one answer to this question, the HSD (Honestly Significant Difference) test. You compute something analogous to a t score for each pair of means, but you don’t compare it to the Student’s t distribution. Instead, you use a new distribution called the studentized range or q distribution.

Caution: Perform post-hoc analysis only if the ANOVA test shows a p-value less than your α. If p>α, you don’t know whether the means are all the same or not, and you can’t go fishing for unequal means.

You generally want to know not just which means differ, but by how much they differ (the effect size). The easiest thing is to compute the confidence interval first, and then interpret it for a significant difference in means (or no significant difference). You’ve already seen this relationship between a test of significance at the α level and a 1−α confidence interval:

• If the endpoints of the CI have the same sign (both positive or both are negative), then 0 is not in the interval and you can conclude that the means are different.
• If the endpoints of the CI have opposite signs, then 0 is in the interval and you can’t determine whether the means are equal or different.

You compute that confidence interval similarly to the confidence interval for the difference of two means, but using the q distribution which avoids the problem of inflating α:

where i and j are the two sample means, ni and nj are the two sample sizes, MSW is the within-groups mean square from the ANOVA table, and q is the critical value of the studentized range for α, the number of treatments or samples r, and the within-groups degrees of freedom dfW. The square-root term is called the standardized error (as opposed to standard error).

Using the studentized range, developed by Tukey, overcomes the problem of inflated significance level that I talked about earlier. If sample sizes are equal, the risk of a Type I error is exactly α, and if sample sizes are unequal it’s less than α: the procedure is conservative. In terms of confidence intervals, if the sample sizes are equal then the confidence level is the stated 1−α, but if the sample size are unequal then the actual confidence level is greater than 1−α (NIST 2012 [full citation in “References”, below] section 7.4.7.1).

### Estimating Differences of Means

Usually the comparisons are presented in a table, like this one for the example with frying donuts:

x̅i−x̅j Critical qq(α,r,dfW) Standardizederror 95% Conf Intervalfor Signifat 0.05? Fat 1 − Fat 2 μi−μj −13 3.9597 4.1008 −29.2 3.2 −4 3.9597 4.1008 −20.2 12.2 10 3.9597 4.1008 −6.2 26.2 9 3.9597 4.1008 −7.2 25.2 23 3.9597 4.1008 6.8 39.2 YES 14 3.9597 4.1008 −2.2 30.2

How do you read the table, and how was it constructed? Look first at the rows. Each row compares one pair of treatments.

If you have r treatments, there will be r(r−1)/2 pairs of means. The “/2” part comes because there’s no need to compare Fat 1 to Fat 2 and then Fat 2 to Fat 1. If Fat 1 is absorbed less than Fat 2, then Fat 2 is absorbed more than Fat 1 and by the same amount.

Now look at the columns. I’ll work through all the columns of the first row with you, and you can interpret the others in the same way.

1. The row heading tells you which treatments are being compared in this row, and the direction of comparison.
2. The next column gives the point estimate of difference, which is nothing more than the difference or the two sample means. The sample means of Fat 1 and Fat 2 were 72 and 85, so the difference is −13: the sample average of Fat 1 was 13 g less fat absorbed than the sample average of Fat 2.
3. Next is critical q, from the confidence interval formula. q(α,r,dfW) depends on the number of treatments and total number of data points, not on the individual treatments, so it’s the same for all rows in any given experiment.

For this experiment, we had four treatments and dfW from the ANOVA table was 20, so we need q(0.05, 4, 20). Your textbook may have a table of critical values for the studentized range, or you can look up q in an online table such as the one at the end of Abdi and Williams 2010 [full citation in “References”, below], or find it with an online calculator like Lowry 2001a [full citation in “References”, below]. Most textbooks don’t have a table of q, and the TI calculators can’t compute it.)

Different sources give slightly different critical values of q, I suspect because q is extremely difficult to compute. One value I found was q(0.05,4,20) = 3.9597.

4. The standardized error is the square-root term from Tukey’s formula for confidence interval.

In an experiment with unequal sample sizes, the standardized error would vary for comparing different pairs of treatments. But in this experiment, every treatment has six data points, and so the standardized error is the same for every pair of means:

(MSW/2)·(1/6+1/6) = √(100.9/2)·(2/6) = 4.1008

5. The endpoints of the confidence interval, as usual, are the point estimate plus or minus the critical q times the standardized error. Critical q times the standardized error is 3.9597×4.1008 = 16.2, and the difference of means in the first row is 12 = −13, so the endpoints of the confidence interval are −13−16.2 = −29.2 and −13+16.2 = 3.2.

Interpretation: You’re 95% confident that, on average, a batch of 24 donuts absorbs between 29.2 g less and 3.2 g more of Fat 1 than Fat 2.

6. The last column applies the relation between confidence interval and significance test to say whether there’s a significant difference between the two treatments.

The confidence interval for the difference between Fat 1 and Fat 2 goes from a negative to a positive, so it does include zero. That means the two fats might have the same or different absorption, so you can’t say whether there’s a difference.

Caution: It’s generally best not to say that there is no significant difference. Even though that’s literally true, it’s easily misinterpreted to mean that the absorption of the two fats is the same, and you don’t know that. It might be, and it might not be. Stick to neutral language.

On the other hand, when the endpoints of the confidence interval are both positive or both negative, then 0 is not in the interval and we reject the null hypothesis of equality. In this table, only Fats 2 and 4 have a significant difference.

Interpretation: Fats 2 and 4 are not equally absorbed in frying donuts, and we’re 95% confident that a batch of 24 donuts absorbs 6.8 g to 30.2 g more of Fat 2 than Fat 4.

### Other Comparisons

BrownMath.com/donate.

It’s possible to make more complicated comparisons. For instance, with a control group and two treatments you might compare the mean of the control group to the average of the means of the two treatments. Any kind of linear comparison can be done using a procedure developed by Henry Scheffé. A good brief explanation of Scheffé’s method is at NIST 2012 [full citation in “References”, below] section 7.4.7.2.

Tukey’s method is best when you are simultaneously comparing all pairs of means. If you have pre-selected a subset of means to compare, the Bonferroni method (NIST 2012 [full citation in “References”, below] section 7.4.7.3) may be better.

## Example 2: Stock Market

5-year Rates of Return Financial Energy Utilities 10.76 12.72 11.88 15.05 13.91 5.86 17.01 6.43 13.46 5.07 11.19 9.90 19.50 18.79 3.95 8.16 20.73 3.44 10.38 9.60 7.11 6.75 17.40 15.70 11.585 13.846 8.913 5.124 4.867 4.530 source: morningstar.com via Sullivan 2011 [full citation at https://BrownMath.com/swt/sources.htm#so_Sullivan2011] page C–30 (on CD)

A stock analyst randomly selected eight stocks in each of three industries and compiled the five-year rate of return for each stock. The analyst would like to know whether any of the industries have a different rate of return from the others, at the 0.05 significance level.

Solution: The hypotheses are

H0: = μF = μE = μU, all three industries have the same average rate of return

H1: the industries don’t all have the same average rate of return

You can use a normal probability plot to assess normality for each sample; see MATH200A Program part 4. The standard deviations of the three samples are fairly close together, so the requirements are met.

Here is the ANOVA table:

SS df MS F p 97.5931 2 48.7965 2.08 0.1502 493.2577 21 23.4885 590.8508 23

The F statistic is only 2.08, so the variation between groups is only about double the variation within groups. The high p-value makes you fail to reject H0 and you cannot reach a conclusion about differences between average rates of returns for the three industries.

Since you failed to reject H0 in the initial ANOVA test, you can’t do any sort of post-hoc analysis and look for differences between any particular pairs of means. (Well, you can, but you know in advance that all of the intervals will include zero, meaning that you don’t know whether any particular sector has a different return from any other sector or not.)

Lifetime, hr x̅ s 407   411   409 409 2.0 404   406   408   405   402 405 2.2 410   408   406   408 408 1.6 source: Spiegel and Stephens 1999 [full citation in “References”, below], pp 378–379

A company makes three types of high-performance CRTs. A random sample finds lifetimes shown in the table at right. At the 0.05 level, is there a difference in the average lifetimes of the three types?

H0: μA = μB = μC, the three types have equal mean lifetime

H1: the three types don’t all have the same mean lifetime

Excel or the TI-83/84 gives you this ANOVA table:

SS df MS F p 36 2 18 4.50 0.0442 36 9 4 72 11

p<α, so you reject H0 and accept H1, concluding that the three types don’t all have the same mean lifetime.

Since you were able to reject the null hypothesis, you can proceed with post-hoc analysis to determine which means are different and the size of the difference. Here is the table:

x̅i−x̅j Critical qq(α,r,dfW) Standardizederror 95% Conf Intervalfor Signifat 0.05? μi−μj 4 3.9508 1.0328 −0.1 8.1 1 3.9508 1.0801 −3.3 5.3 −3 3.9508 0.9487 −6.7 0.7

This result might surprise you: although the three means aren’t all equal, you can’t say that any two of the means are unequal. But when you look more closely at the numbers, this doesn’t seem quite so unreasonable.

First, look at the p-value in the ANOVA table: 0.0442 is below 0.05, yes, but it’s not very far below. There’s almost a 4˝% chance that we’re committing a Type I error in rejecting H0. Next, look at the confidence interval μA−μB. While the interval does include 0, it’s extremely lopsided and almost doesn’t include 0.

Though we’re used to thinking of significance as “either it is or it isn’t”, there are cases where the decision is a close one, and this is one of those cases. And the confidence intervals are computed by a different method than the significance test, using a different distribution. Here again, the decision is a close one. So what we have is two close decisions, based on different computations, one falling slightly on one side of the line and the other falling slightly on the other side of the line. It’s a good reminder that in statistics we’re dealing with probabilities, not certainties.

## Appendix (The Hard Stuff)

The following sections are for students who want to know more than just the bare bones of how to do a 1-way ANOVA test.

### Why Not Just Pick Two Means and Do a t Test?

Remember that you have to set up hypotheses up before you know the data. Before you’ve actually fried the donuts, you have no reason to expect any particular outcome. Specifically, until you have the data you have no reason to think Fats 2 and 4 are any more different than Fats 1 and 4, or any other pair.

Why can’t you collect the data and then select your hypotheses? Because that can put significance on a chance event. For example, a golfer hits a ball and it lands on a particular tuft of grass. The probability of landing on that particular tuft is extremely small, so there’s something different about that particular tuft, right? Obviously not! It’s a logical fallacy to decide what to test after you already have the data.

So if you want to do a 2-sample t test in differences among four fats you would have to test every pair of fats: 1 and 2, 1 and 3 1 and 4, 2 and 3, 2 and 4, 3 and 4. That’s six hypotheses in all.

Well, why not do a 0.05 significance test on pair of means? Remember what a 0.05 significance level means: you’re willing to accept a 5% chance of a Type I error, rejecting H0 when it’s actually true. But if you test six 0.05 hypotheses on the same set of data, you’re much more likely to commit a Type I error. How much more likely? Well, for each hypothesis there’s a 95% chance of escaping a Type I error, but the probability of escaping a Type I error six times in a row is 0.956 = 0.7351. 1−0.7351 = 0.2649, so if you test all six pairs at the 0.05 level, you’re more likely than one chance in four to get a false positive, finding a difference between two fats when there’s actually no difference.

Prob. of Type I Error
groupspairs α = 0.05α = 0.01
330.14260.0297
460.26490.0585
5100.40130.0956
6150.53670.1399

In general, if you have r treatments, there are r(r−1)/2 pairs of means to compare. If you test each pair at significance level α, the overall probability of a Type I error is 1 − (1−α)r(r−1)/2. The table at right shows the effective α for various numbers of treatments when the nominal α is 0.05 or 0.01. You can see that testing multiple hypotheses increases your α dramatically. Even with just three treatments, the effective α is almost three times the nominal α. This is clearly unacceptable.

Why not just lower your alpha? Because as you lower your α you increase your β, the chance of a Type II error. β represents the probability of a false negative, failing to find a difference in fats when there actually is a difference. This, too, is unacceptable.

So you have to find a way to test all the pairs of means at the same time, in one test. The solution is an extension of the t test to multiple samples, and it’s called ANOVA. (If you have only two treatments, ANOVA computes the same p-value as a two-sample t test, but at the cost of extra effort.)

### How ANOVA Works

How does the ANOVA procedure compute a p-value? This section shows you the formulas and carries through the computations for the example with fat for frying donuts.

Remember, long ago in a galaxy called Descriptive Statistics, how the variance was defined: find the mean, then for each data point take the square of its difference from the mean. Add up all those squares, and you have SS(x), the sum of squared deviations in x. The variance was SS(x) divided by the degrees of freedom n−1, so it was a kind of average or mean squared deviation. You probably learned the shortcut computational formulas:

SS(x) = ∑x˛ − (∑x)˛/n or SS(x) = ∑x˛ − n˛

and then

s˛ = MS(x) = SS(x)/df where df = n−1

In 1-way ANOVA, we extend those concepts a bit. First you partition SS(x) into between-treatments and within-treatments parts, SSB and SSW. Then you compute the mean square deviations:

• MSB is called the between-treatments mean square, between-groups variance, or factor MS. It measures the variability associated with the different treatment levels or different values of the factor.
• MSW is called the within-treatments mean square, within-group variance, pooled variance, or error MS. It measures the variability that is not associated with the different treatments.

Finally you divide the two to obtain your test statistic, F = MSB/MSW, and you look up the p-value in a table of the F distribution.

(The F distribution is named after “the celebrated R.A. Fisher” (Kuzma & Bohnenblust 2005 [full citation at https://BrownMath.com/swt/sources.htm#so_Kuzma2005], 176). You may have already seen the F distribution in computing a different ratio of variances, as part of testing the variances of two populations for equality.)

There are several ways to compute the variability, but they all come up with the same answers and this method in Spiegel and Stephens 1999 [full citation in “References”, below] pages 367–368 is as easy as any:

SS df MS F SSB = ∑njx̅j˛−Nx̅˛ dfB = r−1 MSB = SSB/dfB F = MSB/MSW SSW = SStot−SSB dfW = N−r MSW = SSW/dfW SStot = ∑x˛−Nx̅˛ dftot = N−1 * or, if you know the standard deviations of the samples, SSW = ∑(nj−1)sj˛ SStot = SSB + SSW

where

• r is the number of treatments.
• nj, j, sj for each treatment are the sample size, sample mean, and sample standard deviation.
• N is the total sample size and  = ∑x/N is the overall sample mean or “grand mean”. can also be computed from the sample means by

= ∑njj/N

You begin with the treatment means j={72, 85, 76, 62} and the overall mean =73.75, then compute

SSB = (6×72˛+6×85˛+6×76˛+6×62˛) − 24×73.75˛ = 1636.5

MSB = 1636.5 / 3 = 545.4

The next step depends on whether you know the standard deviations sj of the samples. If you don’t, then you jump to the third row of the table to compute the overall sum of squares:

x˛ = 64˛ + 72˛ + 68˛ + ... + 70˛ + 68˛ = 134192

SStot = ∑x˛ − N˛ = 134192 − 24×73.75˛ = 3654.5

Then you find SSW by subtracting the “between” sum of squares SSB from the overall sum of squares SStot:

SSW = SStot−SSB = 3654.5−1636.5 = 2018.0

MSW = 2018.0 / 20 = 100.9

Now you’re almost there. You want to know whether the variability between treatments, MSB, is greater than the variability within treatments, MSW. If it’s enough greater, then you conclude that there is a real difference between at least some of the treatment means and therefore that the factor has a real effect. To determine this, divide

F = MSB/MSW = 5.41

This is the F statistic. The F distribution is a one-tailed distribution that depends on both degrees of freedom, dfB and dfW.

At long last, you look up F=5.41 with 3 and 20 degrees of freedom, and you find a p-value of 0.0069. The interpretation is the usual one: there’s only a 0.0069 chance of getting an F statistic greater than 5.41 (or higher variability between treatments relative to the variability within treatments) if there is actually no difference between treatments. Since the p-value is less than α, you conclude that there is a difference.

### Estimating Individual Treatment Means

Usually you’re interested in the contrast between two treatments, but you can also estimate the population mean for an individual treatment. You do use a t interval, as you would when you have only one sample, but the standard error and degrees of freedom are different (NIST 2012 [full citation in “References”, below] section 7.4.3.6).

To compute a confidence interval on an individual mean for the jth treatment, use

df = dfW

standard error = √MSW/nj

Therefore the margin of error, which is the half-width of the confidence interval, is

E = t(α/2,dfW) · √MSW/nj

Example: Refer back to the fats for frying donuts. Estimate the population mean for Fat 2 with 95% confidence? In other words, if you fried a great many batches of donuts in Fat 2, how much fat per batch would be absorbed, on average?

sample mean for Fat 2: 2 = 85

sample size: n2 = 6

degrees of freedom: dfW = 20 (from the ANOVA table)

MSW = 100.9 (also from the table)

1−α = 0.95

TI-83 or TI-84 users, please see an easy procedure below.

#### Computation by Hand

Begin by finding the critical t. Since 1−α = 0.95, α/2 = 0.025. You therefore need t(0.025,20). You can find this from a table:

t(0.025,20) = 2.0860

Next, find the standard error. This is

standard error = √MSW/nj = √100.9/6 = 4.1008

Now you’re ready to finish the confidence interval. The margin of error is

E = t(α/2,df) · √MSW/nj = 2.0860×4.1008 = 8.5541

Therefore the confidence interval is

μ2 = 85 ± 8.6 g (95% confidence)

or

76.4 g ≤ μ2 ≤ 93.6 g (95% confidence)

Conclusion: You’re 95% confident that the true mean amount of fat absorbed by a batch of donuts fried in Fat 2 is between 76.4 g and 93.6 g.

#### TI-83/84 Procedure

Your TI calculator is set up to do the necessary calculations, but there’s one glitch because the degrees of freedom is not based on the size of the individual sample, as it is in a regular t interval. So you have to “spoof” the calculator as follows.

Press [`STAT`] [`◄`] [`8`] to bring up the TInterval screen. First I’ll tell you what to enter; then I’ll explain why.

• : mean of the treatment sample, here 85
• Sx: √MSW*(dfW+1)/nj, here √100.9*21/6
• n: dfW+1, here 21
• C-Level: as specified in the problem, here .95

Now, what’s up with n and Sx? Well, the calculator uses n to compute degrees of freedom for critical t as n−1. You want degrees of freedom to be dfW, so you lie to the calculator and enter the value of n as dfW+1 (20+1 = 21).

But that creates a new problem. The calculator also divides s by √n to come up with the standard error. But you want it to use nj (6) and not your fake n (21). So you have to multiply MSW by dfW+1 and divide by nj to trick the calculator into using the value you actually want.

By the way, why is MSW inside the square root sign? Because the calculator wants a standard deviation, but MSW is a variance. As you know, standard deviation is the square root of variance.

All this fakery achieves the desired result: the confidence interval matches the one that you would have if you computed it by hand.

Conclusion: You’re 95% confident that the true mean amount of fat absorbed by a batch of donuts fried in Fat 2 is between 76.4 g and 93.6 g.

### η˛: Strength of Association

Lowry 1988 [full citation in “References”, below] chapter 14 part 2 mentions a measure that is usually neglected in ANOVA: η˛. (η is the Greek letter eta, which rhymes with beta.)

η˛ = SSB/SStot, the ratio of sum of squares between groups to total sum of squares. For the donut-frying example,

η˛ = SSB/SStot = 1636.5 / 3654.5 = 0.45

What does this tell you? η˛ measures how much of the total variability in the dependent variable is associated with the variation in treatments. For the donut example, η˛ = 0.45 tells you that 45% of the variability in fat absorption among the batches is associated with the choice of fat.

## References

Abdi, Hervé, and Lynne J. Williams. 2010.
“Newman-Keuls Test and Tukey Test”. In Encyclopedia of Research Design. Sage. Retrieved 18 Nov 2021 from https://personal.utdallas.edu/~herve/abdi-NewmanKeuls2010-pretty.pdf
Lowry, Richard. 1988.
Concepts & Applications of Inferential Statistics. Retrieved 18 Nov 2021 from http://vassarstats.net/textbook/
Lowry, Richard. 2001a.
“Critical Values of Q”, part of Calculators for Statistical Table Entries. Retrieved 15 May 2016 from http://www.vassarstats.net/tabs.html#q
Lowry, Richard. 2001b.
One-Way Analysis of Variance for Independent or Correlated Samples (online calculator). Retrieved 15 May 2016 from http://vassarstats.net/anova1u.html
Miller, Rupert G., Jr. 1986.
Beyond ANOVA: Basics of Applied Statistics. Wiley.
NIST, National Institute of Standards and Technology. 2012.
NIST/SEMATECH e-Handbook of Statistical Methods. Retrieved 15 May 2016 from https://www.itl.nist.gov/div898/handbook/
Snedecor, George W., and William G. Cochran. 1989.
Statistical Methods. 8th ed. Iowa State.
Spiegel, Murray R., and Larry J. Stephens. 1999.
Theory and Problems of Statistics. 3d ed. McGraw-Hill.

## What’s New?

• Updated links to references here and here.
• 20 Oct 2020: Improved rendering of square roots of formulas. Italicized variable names. Converted page from HTML 4.01 to HTML5.
• (intervening changes suppressed)
• 31 Jan 2009: First publication.