*clear out any existing data clear *increase the memory or there will problems set memory 40m *increase the number of variables you can include in the datset set matsize 100 ** change the working directory ** cd "C:\Documents and Settings\Sandrah Eckel\Desktop\LDA lab4" ** read in the data set ** use pigs.stata.dta, clear ** alternatively click on: file>open>pigs.stata.data ** Note: data is currently in wide format sum week1-week9 ** standardize** ** mean and standard deviation is from the "sum" command output ** gen sweek1 = (week1 - 25.02)/2.47 gen sweek2 = (week2 - 31.78)/2.79 gen sweek3 = (week3 - 38.86)/3.54 gen sweek4 = (week4 - 44.39)/3.73 gen sweek5 = (week5 - 50.16)/4.53 gen sweek6 = (week6 - 56.45)/4.45 gen sweek7 = (week7 - 62.46)/4.97 gen sweek8 = (week8 - 69.30)/5.42 gen sweek9 = (week9 - 75.22)/6.34 **-- scatterplot matrix graph matrix sweek1-sweek9, half iscale(0.7) diagonal("1" "2" "3" "4" "5" "6" "7" "8" "9") ** calculate the correlation between each pair of weeks ** corr sweek1-sweek9 ** explore residual correlation structure ** ** need long format for regression ** reshape long week sweek, i(Id) j(time) regress week time predict weekrs, resid ** NOTE: must read in autocor.ado if this file is not in the working directory ** autocor weekrs time Id ** calculate standard error for acf ** ** save current data set before infile the data for ACF from the "autocor" command above ** save "pig_lab4_temp.dta", replace ** alternatively click on: file>save as>pigs.temp.dta ** read in the ACF data generated by "autocor.ado" ** ** acf in the following command is the file name in which the ACF values are stored ** ** this file is in the working directory of STATA ** insheet using acf, names clear ** calculate number of independent pairs ** gen N=48 gen lag=_n gen num = N*(9-lag) ** calculate standard error for ACF ** gen se = 1/sqrt(num) ** plot ACF with 95% CI ** gen lb = acf - 2*se gen ub = acf + 2*se ** make the CI between 0-1 ** replace ub = 1 if ub > 1 replace lb = 1 if lb > 1 replace lb = 0 if lb < 0 replace ub = 0 if ub < 0 twoway scatter acf lb ub lag ** to get the number of independent pairs for unbalanced data ** ** need to load the pig data again ** ** you can choose to save or not save the acf data set ** use "pig_lab4_temp.dta", clear reshape wide week sweek weekrs, i(Id) j(time) pwcorr weekrs1 weekrs2 weekrs3 weekrs4 weekrs5 weekrs6 weekrs7 weekrs8 weekrs9, obs ** plot variogram ** ** need longitudinal environment for the variogram ** reshape long week sweek weekrs, i(Id) j(time) tsset Id time ** neet to run variogram.ado to define this command ** ** need to run xtdiff.ado, ksmapprox.ado before running variogram.ado ** variogram weekrs, discrete ** Calculate the ACF from variogram ** ** save current data set before infile the data for variogram from the "variogram" command above ** save "pig_lab4_temp.dta", replace ** read in the variogram data generated by "variogram.ado" ** ** vario in the following command is the file name in which the variogram values are stored ** ** this file is in the working directory of STATA ** insheet using vario, names clear ** make the variogram plot by ourselves ** ** v is the sample variogram ** ** vsmth is the variogram which is the smooth average of sample variogram ** ** vary is the total variance, the horizontal line ** ** ulag is the time-deifference ** twoway scatter v vsmth vary ulag if v<30, msymbol(p i i) connect(i l l) ** calculate ACF from the variogram ** gen varioacf = 1 - vsmth/vary ** plot ACF ** twoway line varioacf ulag, ylabel(-1(.2)1) yline(-1 0 1) ** examine the stationary property of the residual ** ** you can choose to save or not save the acf data set ** clear use "pig_lab4_temp.dta", clear reshape wide week sweek weekrs, i(Id) j(time) sum weekrs1 weekrs2 weekrs3 weekrs4 weekrs5 weekrs6 weekrs7 weekrs8 weekrs9