Abstract

Multi-city time series studies of particulate matter (PM) and mortality and morbidity have provided evidence that daily variation in air pollution levels is associated with daily variation in mortality counts. These findings served as key epidemiological evidence for the recent review of the United States National Ambient Air Quality Standards (NAAQS) for PM. As a result, methodological issues concerning time series analysis of the relation between air pollution and health have attracted the attention of the scientific community and critics have raised concerns about the adequacy of current model formulations. Time series data on pollution and mortality are generally analyzed using log-linear, Poisson regression models for overdispersed counts with the daily number of deaths as outcome, the (possibly lagged) daily level of pollution as a linear predictor, and smooth functions of weather variables and calendar time used to adjust for time-varying confounders. Investigators around the world have used different approaches to adjust for confounding, making it difficult to compare results across studies. To date, the statistical properties of these different approaches have not been comprehensively compared. To address these issues, we quantify and characterize model uncertainty and model choice in adjusting for seasonal and long-term trends in time series models of air pollution and mortality. First, we conduct a simulation study to compare and describe the properties of statistical methods commonly used for confounding adjustment. We generate data under several confounding scenarios and systematically compare the performance of the different methods with respect to the mean squared error of the estimated air pollution coefficient. We find that the bias in the estimates generally decreases with more aggressive smoothing and that model selection methods which optimize prediction may not be suitable for obtaining an estimate with small bias. Second, we apply and compare the modelling approaches to the National Morbidity, Mortality, and Air Pollution Study (NMMAPS) database which is comprised of daily time series of several pollutants, weather variables, and mortality counts covering the period 1987--2000 for the largest 100 cities in the United States. When applying these approaches to adjusting for seasonal and long-term trends we find that the NMMAPS estimates for the national average effect of \PMTen\ at lag 1 on mortality vary over approximately a two-fold range, with 95\% posterior intervals always excluding zero risk.

Simulation datasets for "Section 3: Simulation Study"

The following compressed R workspace files contain the simulated time series datasets used for the simulation study in Section 3 of the paper. Each file contains 500 datasets of a particular scenario. The compressed workspace file can be loaded into an R session with the 'load()' function.

After loading a particular file, an object named 'simDatasets' will be created in your workspace. This object is a list of length 500. Each element of the list is a data frame containing a simulated dataset with 2922 observations (8 years of daily data) and 4 columns. Each dataset looks something like the following:

> head(simDatasets[[1]])
  death  pm10tmean tmpd    time
1    25 -15.971546 20.0 -2556.5
2    36 -24.984552 25.0 -2555.5
3    35  -7.642580 25.5 -2554.5
4    27   4.596487 34.0 -2553.5
5    40 -15.327102 33.0 -2552.5
6    41  -9.984145 32.5 -2551.5

The variables are

death
a mortality count
pm10tmean
a mean subtracted PM10 level (in micrograms/m^3)
tmpd
temperature (average of daily max and daily min, in degrees F)
time
a time index

The simulated dataset files are:

  1. simDatasets-fg.rda [24MB]: g(t) smoother than f(t), moderate concurvity.
  2. simDatasets-fg10.rda [24MB]: g(t) smoother than f(t), high concurvity.
  3. simDatasets-gf.rda [24MB]: g(t) rougher than f(t), moderate concurvity.
  4. simDatasets-gf10.rda [24MB]: g(t) rougher than f(t), high concurvity.

Mortality, air pollution, and weather data for "Section 4: NMMAPS Data Analysis"

All of the mortality, air pollution, and weather data for reproducing the results in the paper can be obtained by downloading the NMMAPSlite package for R from CRAN.

The list of cities (abbreviated names) used in the analysis in Section 4 is contained in the file cityList.R. This file can be read into an R session using the 'source()' function (i.e. source("cityList.R")).