140.655 - Third Term 2004
LONGITUDINAL DATA ANALYSIS

Instructor: Francesca Dominici
Teaching Assistants: Sorina Eftim and Yi Huang
LECTURE: W3204 MW 10:30 - 12:00
LAB: W3204 MW 9:15 - 10:15

First lab Wednesday January 28 from 9:15 to 10:15 am in W3204. TA: Sorina Eftim

First TA office hrs, Wed. January 28 from 12:15 to 1:15 pm in W4007. TA: Yi Huang


INDEX
Course info
Announcements
Exams
Lecture Notes
Lab Notes
Data sets
Software (S+, STATA, SAS)
LDA books
Acknowledgments

COURSE INFO
  • COURSE OBJECTIVES [ps] [pdf]
  • Dr. Dominici's Office Hour M 12:30-1:30 pm Room E3148
  • Sorina's and Yi's Office Hours: Wed. 12:15-1:15 pm Room W4007
  • ANNOUNCEMENTS AND IMPORTANT DATES

    ►First class Wednesday January 21 from 10:30 am to 12:00 pm W3204

  • First lab Wednesday January 28 from 9:15 to 10:15 am in W3204
  • First office hour Wednesday January 28 from 12:15 to 1:15 pm in W4007

    ► Please send your e-mail address to yhuang@jhsph.edu, to be included in the class mailing list

  • Problem Set 1 due F February 6 by 5:00 pm in Yi's mailbox
  • Mid Term Assignment due F February 20 by 5:00 pm in Francesca's mailbox
  • Problem Set 2 due F March 5 by 5:00 pm in Sorina's mailbox
  • Final Assignment due F Match 19 by 5:00 pm in Francesca's mailbox
    EXAMS

    LECTURE NOTES
  • 1. Examples of Longitudinal Data Sets [ps] [pdf]
  • 2. Exploratory Data Analysis [ps] [pdf]
  • 3. Linear Regression: a review [ps] [pdf]
  • 4. Linear Models for Correlated data: examples [ps] [pdf]
  • 5. Linear Models for Correlated data: inference [ps] [pdf]
  • 6. Parametric Models for Covariance Structure [ps] [pdf]
  • 7. Parametric Models for Covariance Structure: examples [ps] [pdf]
  • 8. Generalized Linear Models for Longitudinal Data [ps] [pdf]
  • READING ASSIGNEMENT
  • Longitudinal Data Analysis Using Generalized Linear Models by Liang K.Y. and Zeger S.L. Biometrika 1986 [pdf]
  • 9. Marginal Logistic Regression Model and GEE [ps] [pdf]
  • 10. Marginal Poisson Regression Model and GEE [ps] [pdf]
  • 11. Generalized Linear Models with Random Effects [ps] [pdf]
  • 12. Transition Models [ps] [pdf]

    LAB NOTES
    Download each of the STATA *.ado files and the *.hlp files. Please use "Save As Source" when ou save them to your hard disk from your web browser. To use the *.ado files, put them in your current directory, in your STATA "ado" directory, or in a directory where STATA will know where to look for them. These are ***not*** throughly tested functions. Please let me know of any bug you find in these functions.
    INTRODUCTION AND EXPLORATORY DATA ANALYSIS

    LAB 1, Monday 1/26 : CANCELLED DUE TO WEATHER CONDITIONS

    LAB 2, Wednesday 1/28: Introduction to Statistical software: STATA, SAS

  • Introduction to Matrix Algebra [matrix_intro.pdf] , extra: trace of matrix, determinant, calculation of the inverse of matrix (2 by 2)
  • Introduction to STATA [stata_intro.pdf] [stata_intro2.pdf]

    Exploratory Data Analysis [lab1_2.pdf] (Autocorrelation, Scatterplot Matrix, Line Plots, and Lowess Smoother)

  • Examples using the CD4+ cell numbers data set [cd4.example.pdf] [cd4.example2.pdf]
  • STATA analysis for calculating the autocorrelation function of the CD4 data [cd4.do]
  • Variogram Plot [variogram.ado] [variogram.hlp] This function requires [xtdiff.ado] and ksmapprox.ado
  • STATA analysis for calculating the variogram of cows data [cows.do]

    Additional Material

  • Introductions to SAS [sas_intro1.pdf] [sas_intro2.pdf]
  • Glossary of Macros [sascode.pdf]
  • Faster function to generate smooth model fits [ksmapprox.ado][ksmapprox.hlp]
  • Function for making plots of means over time [xtgraph.ado] [xtgraph.hlp] pdf demonstration file
  • Function to compute sample autocorrelation function for fixed time points of equal lag [autocor.ado] pdf help file


    LINEAR MODELS FOR INDEPENDENT AND CORRELATED DATA

    LAB 3, Monday 2/2: Linear regression using STATA

  • Multiple regression in matrix notation [matrix.pdf]
  • Estimating variance within subjects and between subjects [xtsumcorr.ado] [xtsumcorr.hlp]
  • Ordinary Least Squares in STATA [pdf]
  • STATA data analysis by use of Ordinary Least Squares [pdf]
  • SAS and STATA analyses of the CD4+ data [output]


    LINEAR MODELS FOR CORRELATED DATA
  • STATA analysis of the weights of pigs data set: Autocorrelation function, Uniform Correlation Model, OLS, and WLS [pdf] Exponential Correlation Model[pdf]
  • STATA analysis of the dental data set (solutions of Problem set 2) [output]
  • STATA analysis of the Nepal data set (solutions of Problem set 3) [output]
  • Fit regression splines to the Nepal Data set in SAS [output]
    Ordinary Least Squares and Weighted Least Squares
  • STATA analysis of sitka spruce trees (population average model) [output]
  • STATA analysis of CD4+ cell numbers (random effect model) [output]
    Robust Estimation
  • Robust Estimation of the sitka spruce data set and fitting splines to the CD4 data set: STATA analysis [output]
    Parametric Models for Covariance Matrices and Introduction to Logistic Regression
  • STATA analysis with exponential correlation model and SAS PROC MIX of the cow's milk data [pdf]
  • Handout with STATA Commands for analysis of continuous longitudinal data [pdf]
  • Introduction to Logistic Regression: STATA Analysis of the Myocardical Infarction Data [pdf]
  • Introduction to Logistic Regression: SAS Analysis of the Myocardial infarction data [program] and output [output]
    Logistic Regression for Longitudinal Data
  • STATA Analysis of the 3x3 Pain Crossover Trial Data [pdf]
  • Problem set 4: Analysis of the Indonesian Children's health study [pdf] STATA do file [ichs.do]
    Poisson Regression and GEE
  • STATA Analyses of the Epileptic seizures data set (Marginal Poisson Regression and GEE) [pdf]
  • Analysis of epileptic seizure data using a population-averaged model and GEE, PROC GENMOD) [program] and output [output]
  • STATA Analyses of the Epileptic seizures data set (Poisson Regression with Random Effects) [pdf]

    DATA SETS
    The data sets are posted in a raw format to be analyzed under SAS, STATA, Splus and R. Please look at the readme file for columns names
  • Readme file of all the data sets below [readme]
  • Example 1.1. CD4+ cell numbers [cd4.raw]
  • Example 1.3. Growth of Sitka spruce [trees.raw] [sitka.raw]
  • Example 1.4. Protein content of milk [barley.raw] [lupins.raw] [mixed.raw]
  • Example 1.6. Epileptic seizures [seize.raw]
  • Example 3.1. Weights of pigs [pigs.raw] in STATA format [pigs.stata.dat]
  • Nepal Clinical Trial Data set [readme] [nepal.raw] in STATA format [nepal.stata.dat]
  • Dental Data [dental.raw]
  • Weight Loss Data [weightloss.raw]
  • HIV Study Data [hivstudy.raw]
  • Multiple Sclerosis Data [afcr.raw]
  • Back Pain Data [back.raw]
  • Myocardial infarction data [infarc.raw]
  • Indonesian children's health study [ICHS.raw]
  • Wheezing data [wheeze2.raw]
  • 3 by 3 Pain crossover trial [crossover33.raw]
  • SOFTWARE
    S-plus functions:
  • Plotting longitudinal data sets [exploratory.s]
  • Data analysis of the sitka spruce trees data set [sitka.s] and handout [ps] [pdf]
  • Link to Oswald: Software for the Analysis of Longitudinal Data in S-plus [website]

  • Stata functions:
  • Convert ASCII file into STATA format using Stat/Transfer [plaintext]
  • Inputting Your Data into STATA [website]
  • Reshaping Data from Wide to Long [website]
  • Resources to help you learn and use STATA [website]
  • A wonderful archive of STATA programs (requires Stata version 6.0) [website]
  • STATA Frequently Asked Questions [website]
  • The xtgee command [website]
  • STATA 6.0 allows direct reading of datasets and command updates over the Web. The how-to is here.
  • Stata Analysis with GEE of the Epileptic Seizures data [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17]

  • SAS functions:
  • Macro for calculating autocorrelation function in SAS [pdf] [readme]
  • Macro for fitting splines to Nepal Data [splinfit.sas]
  • Generate correlated normal data [gendat.sas]
  • PROC MIXED for the sitka.data [sitka.sas] and handout [ps] [pdf]
  • Fit OLS and WLS models for gendat.sas data [owlsfit.sas]
  • SAS analysis of the dental data set [program] [output]
  • Analysis of dental data using a random coefficient model, PROC MIXED [program] and output [output]
  • Analysis of dental data using linear mixed effects model, PROC MIXED) [program] and output [output]
  • Fit a Logistic Regression Model to the Myocardial infarction data [program] and output [output]
  • Analysis of epileptic seizure data using a population-averaged model and GEE, PROC GENMOD) [program] and output [output]
  • Comparing the SAS GLM and Mixed Procedures for Repeated Measures [pdf]
  • LDA BOOKS
  • Analysis of Longitudinal Data, Peter J. Diggle, Kung-Yee Liang and Scott L. Zeger, Oxford (1999) (TEXTBOOK) [table of contents] [errata]
  • Nonlinear Models for Repeated Measurement Data, Marie Davidian and David Giltiman Chapman and Hall (1995) [table of contents]
  • Linear Mixed Models for Longitudinal Data, G. Verbeke, Katholieke Universiteit Leuven, Leuven, Belgium; G. Molenberghs, Springer Series in Statistics (2000) [table of contents] [book datsets]
  • Linear Mixed Models in Practice : An Sas-Oriented Approach, Geert Verbeke, Geert Molenberghs, Springer-Verlag (2000). [table of contents]
  • A Handbook of Statistical Analyses using Stata, Sophia Rabe-Hesketh and Brian Everitt, Chapman & Hall/CRC (2004) [table of contents]
  • Acknowledgments
    This web page contains lecture notes, example, data sets, and software also developed by students and colleagues. In particular I would like to thank biostat students Nikhil Gupte, Hongfei Guo for their help in posting LDA materials; Dr. Scott L. Zeger for providing his course notes and problem sets; Dr. Marie Davidian for sharing SAS software, course notes, and data sets; Dr. Paul Rathouz for sharing STATA code, course notes, and data sets; Dr. Irizarry for providing Splus software; Dr. McDermott for assistance to html programming; and biostatistics students for provinding Stata output of homeworks. For comments and suggestions please e-mail me at fdominic@jhsph.edu.