** HW1 solutions ** ** 2-11-09 ** ** Problem 1 ** ** Prepare for data import clear set memory 40m set matsize 800 ** Import the Nepal children's growth data use "C:\nepal.stata.dat", clear ** Make scatter plot of weight vs. age scatter wt age, s(Oh) **Generate a variable indicating the visit number for each id by id: gen time=_n **Remove observations with weight above 80kg (Q2-4 I used this dataset) drop if (wt>80) **Or, code observations with weight above 80kg as missing replace wt=. if wt>80 ** Problem 3 ** ** Fit linear regression of weight vs. age reg wt age ** Obtain fitted value predict fit1 **Find the 33and 67percentile for age centile age, centile(33, 67) **Create splines gen age27=age-27 replace age27 = 0 if age27<0 gen age48=age-48 replace age48 = 0 if age48<0 ** Fit a linear spline with knots at the 33rd and 67th percentile reg wt age age27 age48 ** Obtain fitted value predict fit2 **Create polynomial terms gen age_2=age^2 gen age_3=age^3 gen age27_3=age27^3 gen age48_3=age48^3 ** Fit a cubic spline with knots at the 33rd and 67th percentile reg wt age age_2 age_3 age27_3 age48_3 ** Obtain fitted value predict fit3 ** Plot predicted curves scatter wt age, s(p) || line fit1 fit2 fit3 age, pstyle(p2 p3 >p4) sort||, legend(lab(2 "Linear Regression") lab(3 "Linear >Spline") lab(4 "Cubic Spline")) ** Problem 4 ** ** Create smooth Lowess line lowess wt age, gen(smth) nograph ** Create spaghetti plot sort id age twoway line wt age, pstyle(p15) connect(L) || line smth age, pstyle(p1) clwidth(thick) sort ||, ytitle("Weight") ** Create a sample that includes all three data points from 20% of the subjects ** First, reshape to wide format so there is one row for each subject. reshape wide wt age ht arm bf day month year smth, i(id) j(time) ** Set seed to 1234 (optional - but it will guarantee matching results) set seed 1234 ** Generate a random variable from uniform(0,1) distribution. gen random=uniform() ** Now reshape data to long format. reshape long wt age ht arm bf day month year smth, i(id) j(time) ** Find the 20% quantile of the uniform(0,1) variable. centile random, centile(20) ** Use sub jects with uniform(0,1) < 20th percentile. twoway line wt age if random<.1862, pstyle(p15) connect(L) || line smth age, pstyle(p1) clwidth(thick) sort ||, ytitle("Weight") ** We first construct a ZAP plot using the raw data instead of residuals. ** Obtain median weight for each sub ject. egen medwt=median(wt), by(id) ** Reshape data to wide format reshape wide wt age ht arm bf day month year smth, i(id) j(time) egen maxmwt=max(medwt) egen minmwt=min(medwt) egen medmwt=median(medwt) egen mwt25=pctile(medwt), p(25) egen mwt75=pctile(medwt), p(75) ** Reshape data to long format reshape long wt age ht arm bf day month year smth, i(id) j(time) gen maxwt=wt if medwt==maxmwt gen minwt=wt if medwt==minmwt gen mwt=wt if medwt==medmwt gen wt25=wt if medwt==mwt25 gen wt75=wt if medwt==mwt75 ** Make a ZAP spaghetti plot using raw data scatter wt age, s(oh)|| line maxwt minwt mwt wt25 wt75 age, clwidth(medthick medthick medthick medthick medthick)|| line smth age, pstyle(p1) clwidth(thick) sort ||, legend(lab(1 "Weight")) ytitle("Weight") ** Problem 5 ** ** Fit linear regression reg wt age ** Obtain residuals predict res1,resid ** Group time variable based on visit number, create scatterplot matrix and an ACF plot of the weight data ** NOTE: must read in autocor.ado in the working directory autocor res1 time id ** Calculate ACF CI keep id time res1 reshape wide res1, i(id) j(time) pwcorr res11-res15, obs ** Calculate number of independent pairs for each lag display 165+167+167+152 display 172+165+153 display 166+150 insheet using acf, names clear gen lag=_n gen int num = 651 in 1 replace num = 490 in 2 replace num = 316 in 3 replace num= 153 in 4 gen se = 1/sqrt(num) ** Plot ACF with 95% CI gen lb = acf - 2*se gen ub = acf + 2*se replace ub = 1 if ub>1 scatter acf lb ub lag, ylabel(0 (0.5) 1) ** Problem 6 ** ** Must read in variogram.ado, xtdiff.ado, ksmapprox.ado ** Plot variogram of the weight data variogram res1 ** Use the variogram output insheet using vario, names clear ** Calculate ACF using variogram ** vsmth is the smooth average of sample variogram γ (u) ** vary is the variance σ2 ** ulag is the lag u gen varioacf = 1 - vsmth/vary **Make ACF plot using ungrouped data twoway line varioacf ulag, ylabel(-1(1)1) yline(-1 1) ** Problem 7 ** **Under an independence model, fit OLS reg wt age sex ht bf mage lit **Under a uniform model, fit WLS xtgee wt age sex ht bf mage lit, i(id) corr(exc) ** Problem 8 ** gen agelit=age*lit xtgee wt age sex ht bf mage lit agelit, i(id) corr(exc) ** Problem 9 ** xtreg wt age sex ht bf mage lit, re i(id)