Page Under Construction

Matching Methods in Causal Inference

Much of my statistical methodology research involves matching methods in causal inference, which relates to how to best choose a group of control individuals to compare with a treated group. Applications I have been involved with include the effects of adolescent drug use on later adult outcomes, estimating the effects of school-based interventions, and estimating the effects of published quality control measures for nursing homes.

Some of my work looks at the use of multiple control groups, examining the theoretical setting and effects of using affinely invariant matching methods with mixtures of ellipsoidally symmetric distributions, and also providing practical guidance. That work is motivated by two specific applications: the evaluation of a school wide dropout prevention program and a clinical trial involving the use of historical patient data to supplement the data from the randomized patients.

I am also very interested in the practical use of propensity scores, and guidelines for that use. Related to that, I am particularly interested in diagnostics of propensity score matching methods, as well as sensitivity analyses.

One way that propensity scores can be used is to select individuals for follow-up, concentrating resources on those who will provide the most relevant information. I will be presenting that work in a presentation at the Society for Prevention Research in May 2007.

Moving From Efficacy to Effectiveness

Lately I have been doing work looking at what we can learn about broader program effectiveness from randomized efficacy trials. This work is in the context of the Positive Behavior and Intervention Supports program, currently being implemented in a large number of Maryland public schools. I am using data from a trial where a set of schools were randomly selected to implement the program, as well as data on all schools in the state, to examine what we can learn about the broader effectivness of the program state-wide. This work will be presented as a poster at the Society for Prevention Research in May 2007.

Projects at Mathematica

I worked on a variety of projects at Mathematica, mostly in the education and nutrition areas. These included a follow-up study of the effects of Upward Bound, a college prep program for disadvantaged students as well as an evaluation of remedial reading programs for elementary school students, which has involved matching methods (see below) and hierarchical linear modeling. In the nutrition area, I obtained small-area estimates of the number of children in poverty, and worked on a project examining the variation in food stamp participation rates across states. For that project, we have developed an innovative statistical method to estimate standardized food stamp participation rates, using data from multiple sources.

Using administrative records to predict census day residency

Administrative records such as tax returns or Medicare files have a lot of potential to supplement the US Census, however one of the main disadvantages is that the time periods covered by the administrative records do not necessarily correspond to census day.  We have developed a Bayesian hierarchical model of migration and observation in the record systems to estimate the probability of census day residency for each individual in the population.  The basis for the model lies in multiple system estimation, however it is unique in that the modeling is done at an individual level and we make use of all of the information in the records, including date and covariate information.  Some of this work was presented in the 2001 (invited session), 2003, and 2005 Joint Statistical Meetings.  This is work with Alan Zaslavsky of Harvard University and Dean Judson of the US Census Bureau.

The use of historical patient data in clinical trials

In graduate school I was involved in a consulting project (with Donald Rubin and Samantha Cook) for Genzyme, involving a new treatment for a rare disease, Fabry's disease. The new drug, Fabrazyme, has become commercially available during the course of the Phase 4 clinical trial, thus invalidating the traditional use of the randomized control group. We have developed methodology to use well matched historical patients to model long term trends in the outcome of interest in untreated patients.  The statistical issues involve estimating propensity scores with missing data and the need to define "baseline" (the date they would have entered into a similar randomized trial) for the historical patients.