** lab3 updated 11:00 am 2-4-09 ** *clear out any existing data drop _all *increase the memory or there will be problems set memory 40m *increase the number of variables you can include in the dataset set matsize 100 ** read in the data set ** ** Make sure you specify the path correctly! cd "/Users/haley/Dropbox/Desktop/LDA09/Lab3-4" use "pigs.stata.dta", clear ** alternatively click on: file>open>pigs.stata.dta summ ** data is currently in wide format, make it into long format reshape long week, i(Id) j(time) ** exercise to shape to wide format ** reshape wide week, i(Id) j(time) ** note that you could have done the same thing by only typing the code: ** reshape wide ** reshape to long back ** reshape long week, i(Id) j(time) ** convert to longitudinal data environment ** tsset Id time ** data description ** xtdes ** box plot for each time point ** sort time graph box week, over(time) ** run xtgraph.ado in the do-file editor before using the command ** ** or make sure that you have saved xtgraph.ado somewhere that STATA ** ** knows where to find it (i.e., in your "c:\ado\personal\" folder) ** ** find out where Stata looks for your ado files ** adopath ** mean trend plot ** xtgraph week, title("Mean trend vs time") bar(ci) ** median trend plot ** xtgraph week, av(median) title("Median trend vs time") bar(iqr) ** kernel smooth ** lowess week time, gen(weeksmth) ** spaghetti plot ** sort Id time twoway line week time, connect(L) twoway line week weeksmth time, connect(L) ** plot a random sample of the pigs ** ** save full dataset to return to later ** save "pigs2.stata.dta" ** sample 5 pigs ** reshape wide week weeksmth, i(Id) j(time) sample 5, count ** plot the remaining pigs ** reshape long week weeksmth, i(Id) j(time) twoway line week time, connect(L) ** return to the full dataset ** use "pigs2.stata.dta", replace ** ZAP plot ** regress week time predict weekrs, resid ** median residual for each pig egen mweekrs=median(weekrs), by(Id) ** reshape to wide to find "the pig" with specificed median residual ** ** if don't use wide format, the percentiles you generated would be ** ** among all the 432 observations, instead among the 48 pigs ** reshape wide weekrs weeksmth week, i(Id) j(time) ** residual of the pig that has the maximum median residual ** egen maxmrs=max(mweekrs) ** residual of the pig that has the minimum median residual ** egen minmrs=min(mweekrs) ** residual of the pig that has the 20th percentile median residual ** egen mrs20=pctile(mweekrs),p(20) ** residual of the pig that has the 75th percentile median residual ** egen mrs74 = pctile(mweekrs),p(74) egen mrs76 = pctile(mweekrs),p(76) ** 48 pigs, 48 is an even number, 75th percentile is mean of two values ** ** it will not equal exactly to some data points ** ** therefore, use 74th or 76th as surrogate ** ** if you have an odd number of subjects, say 49 pigs ** ** 75th would be fine, and 20th would have problem ** ** So BE CAREFUL with percentiles ** ** reshape to long for plotting the residuals for each corresponding pig ** reshape long weekrs weeksmth week, i(Id) j(time) gen maxrspig=weekrs if mweekrs==maxmrs gen minrspig=weekrs if mweekrs==minmrs gen rspig20=weekrs if mweekrs==mrs20 ** one option to find a surrogate for 75th ** gen rspig75 = weekrs if mweekrs<=mrs76 & mweekrs>mrs74 twoway (scatter weekrs time) (line maxrspig minrspig rspig20 rspig75 time) ** scatter matrix plot ** reshape wide weekrs weeksmth maxrspig minrspig rspig20 rspig75 week, i(Id) j(time) graph matrix week1 week2 week3 week4 week5 week6 week7 week8 week9,half