# delimit ; ******************************************************; ****** Sitka tree data set **********************; ****** Ozone effect on the growth pattern - log(size) *****; ****** Yi Huang, Feb. 16th, 2004 **********************; ****** LDA Lab -7 -- Focus on different estimations ******; ****** Stata version - 7 ************************; *******************************************************; ** Before doing analysis, please specify where you want to save your results ; ** The log file, which is specified below, is used to save all the results; * Turn off -more- pause; clear; set more off; * Save log file on disk; capture log close; log using "d:\teaching\LDA\Lab\lab7\tree.log", replace; * Extend linesize for log; set linesize 100; *********************************************************; ** Step -1, to read in data, use infile command ** ; ****************************************; ** since sitka.data start with a list of variable names; ** please make sure you delete this first row and resave it sitka.data, before using infile to read in; infile logsize days chamber ozone year tree using "d:\teaching\LDA\data\sitka.data", clear; rename tree id; save "d:\teaching\LDA\data\sitka.dta", replace; summ; ** Reshape the data into "long" or "wide" format - depend on which one you want; ** If the data is not in long format, we need to change it into long format before modelling; ** for now, we just work with 89 data, and 632 observations deleted; keep if year==88; ** make in wide format, and save; reshape wide logsize, i(id) j(days); ** make in long format, and save; reshape long logsize, i(id) j(days); ** use spead-sheet to take a look; save "d:\teaching\LDA\data\sitka88-long.dta", replace; ** Step -2, Set up Longitudinal data analysis setting ** ; *********************************************; ** convert an ordinary data into a longitudinal dataset, specifying subject index and time index; tsset id days; ** Step -3, Brief review on the LDA commands ** ; *********************************************; ** reg fit model with ordinary least sqare, ignoring the correlation structure; ** xtgee fit model using GEE, with specifying any type of working correlation; ** xtcorr follow xtgee command, to give the correlation structure output; ** fitting model with WLS; ** xtgls + igls +corr(??) : fit linear model using generalized least square (GLS) with some type of working correlation; ** Step -4, EDA analysis -- distance difference across boys and girls over time ** ; ********************************************************************; ** describe the pattern of data, the distribution of covariates (all categorical); xtdes; xttab chamber; xttab ozone; ** Describe the logsize difference over time, across normal and ozone environment ; ** ozone = 0 (normal), ozone = 1 (ozone),; sort ozone; by ozone: sum logsize; sort chamber; by chamber: sum logsize; ** Plotting logsize over time, across chambers ; ** two scatterplots; graph logsize days if ozone==0, ti("scatterplot of logsize vs. days, in normal environ") saving(g1, replace); graph logsize days if ozone==1, ti("scatterplot of logsize vs. days, in ozone environ") saving(g2, replace); ** two box plots; sort days; graph logsize if ozone==0, box by(days) ti("Boxplot of logsize vs time, in normal environ") saving(g3, replace); graph logsize if ozone==1, box by(days) ti("Boxplot of logsize vs time, in normal environ") saving(g4, replace); graph using g1 g2 g3 g4; *drop aaaaaaaaaa; ** Mean trend plot***; ** using mean to plot; xtgraph logsize, group(ozone) ti("Mean logsize vs days") bar(se) offset(1.5) saving(g1, replace); ** using offset(0.2) to shift the plots, so that they do not lay on top of each other; xtgraph logsize, group(ozone) ti("Median logsize vs days") av(median) bar(iqr) offset(2) saving(g2, replace); ** ** Spaghetti plots -- use "twoway line" command in stata 8, please ; sort ozone id days; graph logsize days, by(ozone) c(l) s(i) saving(g3, replace); graph using g1 g2 g3; *drop aaaaaa; ********************************************************; ** Step -5, Model based data analysis -- OLS and WLS ***** ; ********************************************************; anova logsize days ozone; ** to calculate the autocorelation function, and plots **; autocor logsize days id ; ** generate dummy for days categories; tab days, gen(t); ** in order to reproduce the results in textbook P76, here are some new notations; ** gen interaction between days and ozone; gen eta=1-ozone; gen gamma=days/100*eta; ** fit model using OLS, ignoring the correlation ; **************************************; ** to see the wrong s.e. estimation *******; reg logsize t1 t2 t3 t4 t5 eta gamma, noconstant ; ** fit model using GEE, specifying the correlation, with out robust s.e. ; ** But, the s.e estimate is still not close to textbook output ***; ******************************************************; ** independent correlation (same as OLS); xtgee logsize t1 t2 t3 t4 t5 eta gamma, noconstant i(id) corr(ind); ** check the estimated correlation matrix; xtcorr; ** exchangeable correlation (same as: uniform correlation in textbook); xtgee logsize t1 t2 t3 t4 t5 eta gamma, noconstant i(id) corr(exc); xtcorr; ** unstructured correlation ; xtgee logsize t1 t2 t3 t4 t5 eta gamma, noconstant i(id) corr(uns); xtcorr; ** fit model using GEE, corr=exchangable, with robust s.e. ; ** Then, s.e estimate is close to textbook output ***; ******************************************************; ** exchangeable correlation (same as: uniform correlation in textbook); xtgee logsize t1 t2 t3 t4 t5 eta gamma, noconstant i(id) corr(exc) robust; xtcorr; xtgee logsize t1 t2 t3 t4 t5 eta gamma, noconstant i(id) corr(uns) robust; xtcorr; drop aaaaaaaa; ** fit model using WLS, with exchangeable correlation ; ********************************************; ** xtreg +mle: fit model using WLS with uniform correlation matrix; ** produce MLE estimate of correlation coefficient, and WLS estimate of \beta; xtreg logsize t1 t2 t3 t4 t5 eta gamma, mle noconstant i(id) ; ** xtreg +re: fit model using WLS, with a random intercept ; ** produce MLE estimate of correlation coefficient, and WLS estimate of \beta; ** the following command is not good, since it does not allow noconstant , need SAS code; ** xtreg logsize t1 t2 t3 t4 t5 eta gamma, re noconstant i(id) ; ** finally, to talk about "copy-paste" to construct your report, using the output **; ** graph file, click "copy graph" under menu "edit"; ** txt file, just highlight the texts you want, do the common copy-paste thing; drop aaaaaaaa; ********************; log close;