Statistics for Laboratory Scientists ( 140.615 )

Analysis of Variance - Nested Models

Example 1 from class: mosquitos

Read the data, and verify that the ANOVA factors are stored as factors in the R object.

library(SPH.140.615)
summary(mosq)
##      length      individual cage 
##  Min.   :49.30   1:6        A:8  
##  1st Qu.:58.25   2:6        B:8  
##  Median :67.05   3:6        C:8  
##  Mean   :66.63   4:6             
##  3rd Qu.:72.03                   
##  Max.   :84.00
str(mosq)
## 'data.frame':    24 obs. of  3 variables:
##  $ length    : num  58.5 59.5 77.8 80.9 84 83.6 70.1 68.3 69.8 69.8 ...
##  $ individual: Factor w/ 4 levels "1","2","3","4": 1 1 2 2 3 3 4 4 1 1 ...
##  $ cage      : Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 1 1 2 2 ...

Plot the data.

par(las=1)
stripchart(split(jitter(mosq$length,factor=2), list(mosq$individual, mosq$cage)), ylab="", pch=1, vertical=TRUE)
abline(v=c(4.5,8.5),lty=2)

Let’s give this a go.

mosq.aov <- aov(length ~ cage / individual, data=mosq)
summary(mosq.aov)
##                 Df Sum Sq Mean Sq F value   Pr(>F)    
## cage             2  665.7   332.8   255.7 1.45e-10 ***
## cage:individual  9 1720.7   191.2   146.9 6.98e-11 ***
## Residuals       12   15.6     1.3                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This is wrong. By default, R tests both factors by calculating the mean-squares with the error in the denominator, and there is no option to tell it otherwise. Later, we will learn how to do this analysis with maximum likelihood estimation. For now, we use a custom function for the sums of squares approach, which is available in the SPH.140.615 package.

nested.anova(mosq.aov)
## Analysis of Variance Table
## 
## Response: length
##                 Df  Sum Sq Mean Sq  F value    Pr(>F)    
## cage             2  665.68  332.84   1.7409    0.2295    
## cage:individual  9 1720.68  191.19 146.8781 6.981e-11 ***
## Residuals       12   15.62    1.30                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This function is also written so you can easily access the entries in the ANOVA table.

str(nested.anova(mosq.aov))
## Classes 'anova' and 'data.frame':    3 obs. of  5 variables:
##  $ Df     : int  2 9 12
##  $ Sum Sq : num  665.7 1720.7 15.6
##  $ Mean Sq: num  332.8 191.2 1.3
##  $ F value: num  1.74 146.88 NA
##  $ Pr(>F) : num  2.30e-01 6.98e-11 NA
##  - attr(*, "heading")= chr [1:2] "Analysis of Variance Table\n" "Response: length"

For example, to estimate the variances in a random effects model, we need to access the Mean Squares.

MS <- nested.anova(mosq.aov)$M
MS
## [1] 332.837917 191.186389   1.301667

Using the expected mean squares formulas from class, we can get the within, subgroup and group estimates.

Sw <- MS[3]
Ss <- (MS[2]-MS[3])/2
Sg <- (MS[1]-MS[2])/(2*4)
S <- c(Sw,Ss,Sg)
S  
## [1]  1.301667 94.942361 17.706441

The standard deviations.

sqrt(S)
## [1] 1.140906 9.743837 4.207902

The percentage contributions to the total variability.

S/sum(S)
## [1] 0.01142309 0.83318974 0.15538717
round(100*S/sum(S),1)
## [1]  1.1 83.3 15.5

Example 2 from class: flies

The data are available in the SPH.140.615 package.

str(flies)
## 'data.frame':    192 obs. of  3 variables:
##  $ response: num  19 19.6 22.9 26 28.6 ...
##  $ jar     : Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 1 1 2 2 ...
##  $ strain  : Factor w/ 8 levels "BS","LC","LDD",..: 3 3 3 3 3 3 3 3 3 3 ...
summary(flies)
##     response     jar        strain  
##  Min.   :17.15   A:64   BS     :24  
##  1st Qu.:25.51   B:64   LC     :24  
##  Median :28.95   C:64   LDD    :24  
##  Mean   :29.52          NH     :24  
##  3rd Qu.:33.23          NKS    :24  
##  Max.   :48.45          OL     :24  
##                         (Other):48

Plot the data.

par(las=2)
stripchart(split(flies$response, list(flies$jar, flies$strain)), ylab="", pch=1, vertical=TRUE, xlim=c(1.2,23.8))
abline(v=seq(3.5,24,by=3),lty=2)

nested.anova(aov(response ~ strain / jar, data=flies))
## Analysis of Variance Table
## 
## Response: response
##             Df Sum Sq Mean Sq F value    Pr(>F)    
## strain       7 1323.4 189.059  8.4673 0.0002196 ***
## strain:jar  16  357.3  22.328  0.8044 0.6793007    
## Residuals  168 4663.2  27.757                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1