ANOVA stands for Analysis of Variance. It is a fundamental diagnostic and inferential tool in regression analysis. The basic idea of ANOVA is to use F tests to assess whether a model or components of a model explains a statistically significant amount of variability in the response variable.
In simple linear regression, the ANOVA table is unique and unambiguous, because there is only one predictor and hence only one way to attribute explained variability. In multiple linear regression, however, predictors may be correlated, and their contributions to the model are no longer uniquely defined. As a result, different conventions have been developed to allocate the explained variability among predictors. These conventions lead to different types of ANOVA tables, commonly referred to as Type I, Type II, and Type III ANOVA. The three types differ in how they account for the presence of other predictors and, when applicable, interaction terms in the model.
More generally, ANOVA is a framework for testing whether a model explains a non-trivial amount of variation in a response variable. Conceptually, it asks whether the reduction in unexplained variability achieved by adding terms to a model is large relative to random noise. Operationally, it answers this by comparing mean squares (variance estimates) using an F-statistic.
ANOVA is built on orthogonal projection in the data space. Fitting a model corresponds to projecting the response vector \(y\) onto the model subspace, which yields the fundamental decomposition of total variability into explained and unexplained components:
\[
\underbrace{\norm{y-\bar y}^2}_{\text{SST}}=\underbrace{\norm{\hat y-\bar y}^2}_{\text{SSR}}+\underbrace{\norm{y-\hat y}^2}_{\text{SSE}}
\] This identity is the mathematical foundation of all ANOVA tables.
9.1 Nested ANOVA
This is a test for comparing nested models.
Definition 9.1 (Nested models) Two models are nested if one model contains all the terms of the second model and at least one additional term.
The more complex model is called the complete (or full) model.
The simpler model is called the reduced (or restricted) model.
Here the main question is whether the additional terms are really necessary. We use a Hypothesis test to answer the question.
\(H_a\): at least one of the \(\beta_{g+1},\ldots,\beta_k\) is nonzero.
\(\displaystyle F=\frac{(SSE_R-SSE_C)/(k-g)}{SSE_C/[n-(k+1)]}\), where \(k\) is the number of full model predictors, and \(g\) is the number of reduced model predictors.
Note that the above denominator \(SSE_C/[n-(k+1)]=MSE\).
The test is usually done with the nested ANOVA table.
Analysis of Variance Table
Model 1: y ~ x1
Model 2: y ~ x1 + x2
Res.Df RSS Df Sum of Sq F Pr(>F)
1 98 184.01
2 97 90.86 1 93.148 99.443 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In this table, each row is about a model. The first row is about the reduced model and the second row is about the full model.
In each model, Res.Df is the degree of freedom of residuals, which is \(n-(p+1)\) where \(p\) is the number of predictors.
In this example, Res.Df is \(100-(1+1)=98\) and \(100-(1+2)=97\).
The difference of degree of freedom is \(2-1=1\).
The difference of \(SSE\) is 184.0083576-90.8599433=93.1484143.
Then the F-statistic is (93.1484143/1)/(90.8599433/97)=99.4431194.
The p-value is gotten by the F-test.
Note
This nested model is the basis of all ANOVA tables listed below. All ANOVA tables in regression are based on comparing nested models. Every row in an ANOVA table corresponds to testing whether a set of parameters can be removed from a larger model.
9.2 ANOVA table for linear regression
This is the anova table introduced in the previous lectures. The main purpose is to show to decomposition and compute the F-statistic as well as the corresponding p-value.
\[
F=\frac{MSR}{MSE}=\frac{\text{variance explained per parameter}}{\text{variance unexplained per observation}}.
\]
It is about the whole group of variables.
9.3 Type I ANOVA table
Type I ANOVA Table (Sequential Sum of Squares) decomposes the total variation in the response by adding predictors to the model sequentially, one at a time, in the order they appear in the model formula.
Suppose we fit the linear model \[
y=\beta_0+\beta_1x_1+\beta_2x_2+\ldots+\beta_px_p+\varepsilon.
\]
The Type I ANOVA table reports \[
SS(x_j\mid x_1,\ldots, x_{j-1})=SSE(\text{model with }x_1,\ldots,x_{j-1})-SSE(\text{model with }x_1,\ldots,x_j).
\]
That is, each sum of squares measures the reduction in unexplained variability obtained by adding \(x_j\) to a model that contains the preceding predictors.
Source
Degrees of Freedom
Sum of Squares
Mean Square
F
\(x_1\)
1
\(SS(x_1)\)
\(MS(x_1)\)
\(F_1\)
\(x_2\)
1
\(SS(x_2 \mid x_1)\)
\(MS(x_2)\)
\(F_2\)
\(\vdots\)
\(\vdots\)
\(\vdots\)
\(\vdots\)
\(\vdots\)
Residuals
\(n-p-1\)
SSE
MSE
Each F-statistic tests
\[
H_0: \beta_j=0\quad \text{ given that $x_1,\ldots,x_{j-1}$ are already in the model.}
\]
The corresponding F-statistic is computed by \[
F_j=\frac{MS(x_j)}{MSE}.
\]
In summary, the Type I ANOVA table answers the question:
How much additional variation does this variable explain when added at this point in the model?
Tip
Type I ANOVA table depends on the order of variables.
Each row corresponds to a nested-model comparison.
Equivalent to anova(model1, model2) for successive models.
Matches classical ANOVA in balanced designs.
Example 9.2 We first generate a dataset, with three correlated predictors \(x_1\), \(x_2\) and \(x_3\). The correlation structure is intentional, so that the sequential nature of Type I ANOVA is visible.
Now fit the model and show the Type I ANOVA table. For Type I ANOVA table, we could use the build-in R function anova().
model123 <-lm(y~x1+x2+x3)anova(model123)
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
x1 1 1336.61 1336.61 1333.29 < 2.2e-16 ***
x2 1 275.19 275.19 274.51 < 2.2e-16 ***
x3 1 115.49 115.49 115.20 < 2.2e-16 ***
Residuals 76 76.19 1.00
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Because Type I ANOVA is sequential, each row corresponds to a nested-model F test.
\(x_1\): Tests adding \(x_1\) to the intercept-only model. Since the p-value is small, we conclude that \(x_1\) explains a significant amount of variation in \(y\).
\(x_2\): Tests adding \(x_2\) to a model that already contains \(x_1\). Since the p-value is small, we conclude that \(x_2\) contributes information beyond \(x_1\).
\(x_3\): Tests adding \(x_3\) to the model with \(x_1\) and \(x_2\). Since the p-value is small, we conclude that \(x_3\) explains additional variation not already captured by \(x_1\) and \(x_2\).
Example 9.3 We now illustrate an example that the order of variables matters. We generate a dataset with two highly correlated variables \(x_1\) and \(x_2\), where the response \(y\) is constructed to depend only on \(x_1\).
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
x1 1 677.37 677.37 748.5025 <2e-16 ***
x2 1 0.05 0.05 0.0579 0.8104
Residuals 97 87.78 0.90
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
When \(x_1\) is added first, it explains most of the variation in \(y\). Because \(x_2\) is largely redundant with \(x_1\), adding \(x_2\) afterward does not produce a significant additional reduction in the residual sum of squares. As a result, \(x_1\) is significant, while \(x_2\) is not.
model21 <-lm(y~x2+x1)anova(model21)
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
x2 1 650.19 650.19 718.462 < 2.2e-16 ***
x1 1 27.24 27.24 30.098 3.272e-07 ***
Residuals 97 87.78 0.90
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
When \(x_2\) is added first, it captures much of the variation in \(y\) due to its strong correlation with \(x_1\), and therefore appears significant. However, because \(x_2\) does not fully explain \(x_1\), there remains variation in \(y\) that is uniquely attributable to \(x_1\). Consequently, \(x_1\) is still significant even after \(x_2\) has been included in the model.
9.4 Type II ANOVA table
A Type II ANOVA table (marginal sum of squares) tests each term after adjusting for all other main effects, but not for interaction terms that contain that effect. In practice, the sums of squares for main effects are computed from the additive model (without interactions), while the interaction terms are tested by comparing the additive model with the model that includes the interaction.
Suppose the full model is
\[
y \sim x_1 + x_2 + x_1:x_2 .
\]
9.4.1 Testing a main effect
To test the main effect (\(x_i\)), the Type II sum of squares is obtained from the additive model
That is, it measures the unique contribution of \(x_j\) after accounting for all other predictors.
9.4.3 F-test
Source
Degrees of Freedom
Sum of Squares
Mean Square
F
\(x_1\)
\(1\)
\(SS^{(II)}(x_1)\)
\(MS^{II}(x_1)\)
\(F_1\)
\(x_2\)
\(1\)
\(SS^{(II)}(x_2)\)
\(MS^{II}(x_2)\)
\(F_2\)
\(\vdots\)
\(\vdots\)
\(\vdots\)
\(\vdots\)
\(\vdots\)
Residuals
\(n - p - 1\)
\(\mathrm{SSE}\)
\(\mathrm{MSE}\)
Each F-statistic tests \[
H_0: \beta_j=0\quad\text{ adjusting for all other main effect predictors.}
\]
The corresponding F-statistic \[
F_j^{(II)}=\frac{MS^{(II)}(x_j)}{MSE}=\frac{SS^{^{II}}(x_j)/df_j}{SSE/(n-p-1)}.
\]
Type II ANOVA table asks the following question:
Does this term improve the model after adjusting for the other main effects (when ignoring the interaction)?
Tip
Reordering predictors does not change the table.
Each effect is tested conditional on all other main effects.
If interactions are present, Type II generally does not give a main-effect test that ignores its interactions; in practice you either (i) interpret via simple effects, or (ii) use Type III with a clear coding convention.
Each row corresponds to a nested-model comparison.
Example 9.4
Click to expand.
We first generate a dataset with two correlated predictors \(x_1\) and \(x_2\).
For Type II ANOVA table, we could use the function Anova from car library.
library(car)
Loading required package: carData
model <-lm(y~x1+x2)Anova(model, type=2)
Anova Table (Type II tests)
Response: y
Sum Sq Df F value Pr(>F)
x1 71.255 1 77.566 2.823e-13 ***
x2 37.255 1 40.554 1.271e-08 ***
Residuals 70.735 77
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
\(x_1\): Compares the full model \(y \sim x_1 + x_2\) to the reduced model \(y \sim x_2\) (i.e., the model with \(x_1\) removed). A small p-value implies that \(x_1\) explains a significant amount of variation in \(y\) beyond what is explained by \(x_2\).
\(x_2\): Compares the full model \(y \sim x_1 + x_2\) to the reduced model \(y \sim x_1\) (i.e., the model with \(x_2\) removed). A small p-value implies that \(x_2\) explains a significant amount of variation in \(y\) beyond what is explained by \(x_1\).
Unlike Type I ANOVA, Type II ANOVA is order-invariant—reordering \(x_1\) and \(x_2\) does not change the results.
9.5 Type III ANOVA Table
A Type III ANOVA table (Fully Adjusted / Coefficient-Based Tests) tests each model term in the presence of all other terms. Equivalently, for a given term, it compares the full model to a reduced model obtained by setting the coefficients associated with that term to zero while keeping all other terms in the model. Unlike Type I ANOVA, the allocation of variability does not depend on the order of predictors in the model.
Suppose we fit the linear model \[
y=\beta_0+\beta_1x_1+\beta_2x_2+\ldots+\beta_px_p+\varepsilon.
\]
The Type III ANOVA table reports, for each predictor \(x_j\),
\[
SS^{(III)}(x_j)=SSE(\text{full model with }\beta_j=0) - SSE(\text{full model}).
\] That is, each sum of squares measures the increase in unexplained variability that results from constraining the coefficient \(\beta_j\) to be zero while keeping all other predictors in the model.
Source
Degrees of Freedom
Sum of Squares
Mean Square
F
\(x_1\)
1
\(SS^{(III)}(x_1)\)
\(MS(x_1)\)
\(F_1\)
\(x_2\)
1
\(SS^{(III)}(x_2)\)
\(MS(x_2)\)
\(F_2\)
\(\vdots\)
\(\vdots\)
\(\vdots\)
\(\vdots\)
\(\vdots\)
Residuals
\(n-p-1\)
SSE
MSE
Each F-statistic tests \[
H_0: \beta_j=0 \quad \text{ given that all other predictors are in the model}.
\]
The corresponding F-statistic is computed by \[
F_j=\frac{MS^{(III)}(x_j)}{MSE}.
\] In the case of a single degree of freedom per predictor, this test is equivalent to the square of the \(t\)-test for the coefficient \(\beta_j\).
In summary, the Type III ANOVA table answers the question:
Is this coefficient of the main effect nonzero in the full model, when all other effects (including interaction terms) presented in the model?
NoteType III vs t-tests
For a 1-degree-of-freedom term (e.g., a single continuous predictor), the Type III F-test is equivalent to the squared t-test for the corresponding coefficient: \[
F^{(III)}=t^2.
\] For multi-df terms (e.g., a factor with multiple levels), Type III tests the whole set of coefficients for that term simultaneously.
Type II tests main effects after adjusting for the other main effects, and (when interactions exist) it typically tests main effects in the additive model to preserve hierarchy. Interactions are tested by comparing the additive model to the interaction model.
Type III tests each term within the full model, even if that term participates in interactions. This can yield hypotheses that are sensitive to coding/centering and may be harder to interpret scientifically.
When there are no interactions, and predictors are coded in the usual way, Type II and Type III coincide (they reduce to the standard partial F-tests).