Statistics for Laboratory Scientists ( 140.615 )

Permutation and Non-Parametric Tests

Example 1 from class

The paired data.

x <- c(117.3, 100.1,  94.5, 135.5,  92.9, 118.9, 144.8, 103.9, 103.8, 153.6, 163.1)
y <- c(145.9,  94.8, 108.0, 122.6, 130.2, 143.9, 149.9, 138.5,  91.7, 162.6, 202.5)
d <- y-x
d
##  [1]  28.6  -5.3  13.5 -12.9  37.3  25.0   5.1  34.6 -12.1   9.0  39.4

The Wilcoxon signed rank test.

wilcox.test(d)
## 
##  Wilcoxon signed rank exact test
## 
## data:  d
## V = 55, p-value = 0.05371
## alternative hypothesis: true location is not equal to 0

The t-test.

t.test(y,x,paired=TRUE)
## 
##  Paired t-test
## 
## data:  y and x
## t = 2.5015, df = 10, p-value = 0.03137
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##   1.611238 27.879671
## sample estimates:
## mean difference 
##        14.74545
t.test(d)
## 
##  One Sample t-test
## 
## data:  d
## t = 2.5015, df = 10, p-value = 0.03137
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##   1.611238 27.879671
## sample estimates:
## mean of x 
##  14.74545

Using the Wilcoxon to test if the median is equal to 0 versus the alternative it is greater than 0.

wilcox.test(d,alternative="greater")
## 
##  Wilcoxon signed rank exact test
## 
## data:  d
## V = 55, p-value = 0.02686
## alternative hypothesis: true location is greater than 0

Using the Wilcoxon to test if the median is equal to 10 versus the alternative it is not equal to 10.

wilcox.test(d,mu=10)
## 
##  Wilcoxon signed rank exact test
## 
## data:  d
## V = 42, p-value = 0.4648
## alternative hypothesis: true location is not equal to 10

Example 2 from class

The two-sample data.

x <- c(43.3, 57.1, 35.0, 50.0, 38.2, 61.2)
y <- c(51.9, 95.1, 90.0, 49.7, 101.5, 74.1, 84.5, 46.8, 75.1)

The Wilcoxon rank-sum test.

wilcox.test(x,y)
## 
##  Wilcoxon rank sum exact test
## 
## data:  x and y
## W = 8, p-value = 0.02557
## alternative hypothesis: true location shift is not equal to 0

The output states that the alternative hypothesis is “true location shift is not equal to 0”. Scrictly speaking, we are not testing the null that the true location shift is equal to 0. We are testing that the distributions in the two groups are the same versus the alternative that they are not, i.e. Pr( X \(>\) Y ) = 0.5 versus Pr( X \(>\) Y ) \(\neq\) 0.5. The Wilcoxon rank-sum test does not test equality of medians unless the two distributions have the same shape and spread.

Example 3

A laboratory scientists measure serum IL-6 concentration (pg/mL) in cell culture supernatant after 24 hours. He has a control (untreated cells) and a treatment (stimulated with a pro-inflammatory compound) group.

control <- c(1.8,2.1,2.0,2.4,2.2,1.9,2.3,2.0,2.5,2.7,2.1,1.7,2.4,2.2,3.1)
treatment <- c(2.0,2.5,3.2,4.8,6.5,3.9,5.1,2.8,4.8,7.3,6.5,3.2,8.4,5.1,9.7)
par(las=1)
stripchart(list(control,treatment),method="stack",pch=1,ylim=c(0.5,2.5),
           group.names=c("C","T"),xlab="IL-6 concentration (pg/mL)")

The Wilcoxon rank-sum test.

wilcox.test(control,treatment)
## Warning in wilcox.test.default(control, treatment): cannot compute exact p-value with ties
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  control and treatment
## W = 14.5, p-value = 5.132e-05
## alternative hypothesis: true location shift is not equal to 0

See the Wilcoxon help file. By default an exact p-value is computed, as introduced in class, if the samples contain less than 50 finite values and there are no ties - otherwise a normal approximation is used. Since we have ties, the normal approximation is used and the function returns a warning about that. To suppress the warning you can overwrite the default for the “exact” argument.

wilcox.test(control,treatment,exact=FALSE)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  control and treatment
## W = 14.5, p-value = 5.132e-05
## alternative hypothesis: true location shift is not equal to 0

Example 4

Let’s see how one outlier influences the test statistics.

xA <- c(27.0,54.6,33.5,27.6,46.0,22.2,44.2,17.3,15.9,32.8)
xB <- c(17.4,20.5,13.9,14.8,27.9,10.6,33.7,15.4,25.0,24.1)
xAnew <- xA
xAnew[2] <- 99.9

Plot the two data sets.

par(mfrow=c(1,2),las=1)
stripchart(list(xA,xB),vertical=T,pch=21,xlim=c(0.5,2.5),ylim=c(0,100),main="Data set 1")
segments(0.9,mean(xA),1.1,mean(xA),lwd=2,col="red")
segments(1.9,mean(xB),2.1,mean(xB),lwd=2,col="blue")
abline(h=mean(c(xA,xB)),lty=2)
stripchart(list(xAnew,xB),vertical=T,pch=21,xlim=c(0.5,2.5),ylim=c(0,100),main="Data set 2")
segments(0.9,mean(xAnew),1.1,mean(xAnew),lwd=2,col="red")
segments(1.9,mean(xB),2.1,mean(xB),lwd=2,col="blue")
abline(h=mean(c(xAnew,xB)),lty=2)

The two-sample t-test on both data sets.

Why is the p-value larger for data set 2 even though the difference in sample means is larger?

t.test(xA,xB)$p.value
## [1] 0.02364417
t.test(xAnew,xB)$p.value
## [1] 0.06869718

The Wilcoxon rank-sum test on both data sets.

Why are the p-values the same?

wilcox.test(xA,xB)$p.value
## [1] 0.03546299
wilcox.test(xAnew,xB)$p.value 
## [1] 0.03546299