Abstract
Multi-city time series studies of particulate
matter (PM) and mortality and morbidity have provided evidence
that daily variation in air pollution levels is associated with
daily variation in mortality counts. These findings served as
key epidemiological evidence for the recent review of the
United States National Ambient Air Quality Standards (NAAQS)
for PM. As a result, methodological issues concerning time
series analysis of the relation between air pollution and
health have attracted the attention of the scientific community
and critics have raised concerns about the adequacy of current
model formulations. Time series data on pollution and mortality
are generally analyzed using log-linear, Poisson regression
models for overdispersed counts with the daily number of deaths
as outcome, the (possibly lagged) daily level of pollution as a
linear predictor, and smooth functions of weather variables and
calendar time used to adjust for time-varying confounders.
Investigators around the world have used different approaches
to adjust for confounding, making it difficult to compare
results across studies. To date, the statistical properties of
these different approaches have not been comprehensively
compared. To address these issues, we quantify and characterize
model uncertainty and model choice in adjusting for seasonal
and long-term trends in time series models of air pollution and
mortality. First, we conduct a simulation study to compare and
describe the properties of statistical methods commonly used
for confounding adjustment. We generate data under several
confounding scenarios and systematically compare the
performance of the different methods with respect to the mean
squared error of the estimated air pollution coefficient. We
find that the bias in the estimates generally decreases with
more aggressive smoothing and that model selection methods
which optimize prediction may not be suitable for obtaining an
estimate with small bias. Second, we apply and compare the
modelling approaches to the National Morbidity, Mortality, and
Air Pollution Study (NMMAPS) database which is comprised of
daily time series of several pollutants, weather variables, and
mortality counts covering the period 1987--2000 for the largest
100 cities in the United States. When applying these approaches
to adjusting for seasonal and long-term trends we find that the
NMMAPS estimates for the national average effect of \PMTen\ at
lag 1 on mortality vary over approximately a two-fold range,
with 95\% posterior intervals always excluding zero risk.
Simulation datasets for "Section 3: Simulation Study"
The following compressed R workspace files contain the
simulated time series datasets used for the simulation study in
Section 3 of the paper. Each file contains 500 datasets of a
particular scenario. The compressed workspace file can be
loaded into an R session with the 'load()' function.
After loading a particular file, an object named
'simDatasets' will be created in your workspace. This object is
a list of length 500. Each element of the list is a data frame
containing a simulated dataset with 2922 observations (8 years
of daily data) and 4 columns. Each dataset looks something like
the following:
> head(simDatasets[[1]])
death pm10tmean tmpd time
1 25 -15.971546 20.0 -2556.5
2 36 -24.984552 25.0 -2555.5
3 35 -7.642580 25.5 -2554.5
4 27 4.596487 34.0 -2553.5
5 40 -15.327102 33.0 -2552.5
6 41 -9.984145 32.5 -2551.5
The variables are
- death
- a mortality count
- pm10tmean
- a mean subtracted PM10 level (in micrograms/m^3)
- tmpd
- temperature (average of daily max and daily min, in
degrees F)
- time
- a time index
The simulated dataset files are:
- simDatasets-fg.rda
[24MB]: g(t) smoother than f(t), moderate concurvity.
- simDatasets-fg10.rda
[24MB]: g(t) smoother than f(t), high concurvity.
- simDatasets-gf.rda
[24MB]: g(t) rougher than f(t), moderate concurvity.
- simDatasets-gf10.rda
[24MB]: g(t) rougher than f(t), high concurvity.
Mortality, air pollution, and weather data for "Section 4:
NMMAPS Data Analysis"
All of the mortality, air pollution, and weather data for
reproducing the results in the paper can be obtained by
downloading the NMMAPSlite package for R from CRAN.
The list of cities (abbreviated names) used in the analysis
in Section 4 is contained in the file cityList.R. This file can be read into an R
session using the 'source()' function (i.e.
source("cityList.R")).