Statistics for Laboratory Scientists ( 140.615 )

Confidence Intervals for Proportions

Tick example 1 from class

Suppose X ~ Binomial(n=29,p) and we wish to test H\(_0\): p=1/2.

Our observed data are X = 24.

The Binomial (n=29,p=0.5).

par(las=1)
barplot(dbinom(0:29,29,p=0.5),names=0:29,cex.names=0.5)

The pedestrian way. First, find the lower endpoint of rejection region.

pbinom(8,29,0.5)
## [1] 0.01205977
pbinom(9,29,0.5)
## [1] 0.03071417
qbinom(0.025,29,0.5) 
## [1] 9

So 9 is too big, and 8 is the lower critical value.

The actual significance level is Pr(X \(\leq\) 8 or X \(\geq\) 21 | p = 1/2).

pbinom(8,29,0.5) + pbinom(20,29,0.5,lower=FALSE) 
## [1] 0.02411954

The p-value for the observed data X = 24 is 2*Pr(X \(\geq\) 24 | p = 1/2).

2*pbinom(23,29,0.5,lower=FALSE)
## [1] 0.0005461127

The easy way.

binom.test(24,29)
## 
##  Exact binomial test
## 
## data:  24 and 29
## number of successes = 24, number of trials = 29, p-value = 0.0005461
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.6422524 0.9415439
## sample estimates:
## probability of success 
##              0.8275862
binom.test(24,29)$p.value
## [1] 0.0005461127

The 95% confidence interval for p.

binom.test(24,29)$conf.int
## [1] 0.6422524 0.9415439
## attr(,"conf.level")
## [1] 0.95

Tick example 2 from class

Suppose X ~ Binomial(n=25,p) and we wish to test H\(_0\): p=1/2.

Our observed data are X = 17.

binom.test(17,25)
## 
##  Exact binomial test
## 
## data:  17 and 25
## number of successes = 17, number of trials = 25, p-value = 0.1078
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.4649993 0.8505046
## sample estimates:
## probability of success 
##                   0.68
binom.test(17,25)$p.value
## [1] 0.1077521
binom.test(17,25)$conf.int 
## [1] 0.4649993 0.8505046
## attr(,"conf.level")
## [1] 0.95

The case X = 0

X ~ Binomial(n=15,p), observe X = 0.

The 95% confidence interval for p.

The easy way.

binom.test(0,15)$conf.int
## [1] 0.0000000 0.2180194
## attr(,"conf.level")
## [1] 0.95

The direct way. The lower limit is 0, and the upper limit is

1-(0.025)^(1/15)
## [1] 0.2180194

The rule of thumb for the upper limit: 3/n.

3/15
## [1] 0.2

The case X = n

X ~ Binomial(n=15,p), observe X = 15.

The 95% confidence interval for p.

The easy way.

binom.test(15,15)$conf.int
## [1] 0.7819806 1.0000000
## attr(,"conf.level")
## [1] 0.95

The direct way. The upper limit is 1, and the lower limit is

(0.025)^(1/15)
## [1] 0.7819806

The rule of thumb for the lower limit: 1-3/n.

1-3/15
## [1] 0.8

The Normal approximation

Image you observe 22 successes in a Binomial experiment with n=100. Calculate a 95% confidence interval for the success probability p. 

x <- 22
n <- 100
phat <- x/n
phat
## [1] 0.22

The exact Binomial confidence interval.

binom.test(x,n)$conf.int
## [1] 0.1433036 0.3139197
## attr(,"conf.level")
## [1] 0.95

Rule of thumb: if both \(\ n \times \widehat{p}\ \) and \(\ n \times (1-\widehat{p})\ \) are larger than 5, you can also use a Normal approximation for the confidence interval.

n*phat
## [1] 22
n*(1-phat)
## [1] 78

All clear - using the formula from class, we get:

phat + c(-1,1) * 1.96 * sqrt(phat*(1-phat)/n)
## [1] 0.1388077 0.3011923

Note that the two confidence intervals are indeed very close.

round(binom.test(x,n)$conf.int,2)
## [1] 0.14 0.31
## attr(,"conf.level")
## [1] 0.95
round(phat+c(-1,1)*1.96*sqrt(phat*(1-phat)/n),2)
## [1] 0.14 0.30

A rough estimate, using the rule of thumb.

phat + c(-1,1) / sqrt(n)
## [1] 0.12 0.32

A simulation study, using a Binomial experiment with n=100 and p=0.3. Simulate 10,000 outcomes, make a histogram (frequency plot), and add the Normal curve with the corresponding mean and standard error.

n <- 100
p <- 0.3
mu <- p
se <- sqrt(p*(1-p)/n)
set.seed(1)
x <- rbinom(10000,n,p)
phats <- x/n
par(las=1)
hist(phats,breaks=seq(0.1,0.5,0.01),prob=TRUE)
curve(dnorm(x,mean=mu,sd=se),add=TRUE,col="red",lwd=2)