**Logistic regression for binary data** **NEPAL DATA** ** updated 02-24-09 ** set more off cd "/Users/haley/Dropbox/Desktop/LDA09/data" use "nepal.dta", clear ** drop extra variables ** drop age2 age3 age4 t2 ** drop other variables we're not using in this analysis drop wt ht arm day month year died alive mage lit ** gen visit number variable to use as our time variable ** sort id age by id: gen visit=_n tab visit xtset id visit xtdes ***Explore our outcome of breastfeeding*** codebook bf drop if bf==. xtdes ***Generate binary breastfeeding variable*** *** information about the 3 categories is contained in the readme file *** *** combine 2 groups to gen 0-1 indicator of breast feeding *** gen bfbin=1 if bf==1|bf==2 replace bfbin=0 if bfbin==. tab bf bfbin ** Explore the binary breastfeeding variable ** xttab bfbin ** the transition matrix ** xttrans bfbin ** Explore our covariates of interest ** codebook age ** create a centered age variable gen agec = age-37.8194 codebook sex * generate a binary indicator for male gender * gen male=(sex==1) tab sex male drop sex ***Explore marginal mean model wrt age ksm bfbin agec, lowess bw(.4) ylab(0(.2)1) lwidth(10) gen(bfbinsm) ** plot smoooth again, this time with jittered observed outcome ** twoway (scatter bfbin agec, jitter(4) msymbol(oh)) (line bfbinsm agec, sort lwidth(1)), ylab(-.1(.2)1.1) ***Explore marginal mean model wrt age and gender ksm bfbin agec, lowess bw(.4) lwidth(10) gen(bfbinsm2) by(male) twoway (scatter bfbin agec, jitter(4) msymbol(oh)) (line bfbinsm2 agec if male==0, sort lwidth(1)) (line bfbinsm2 agec if male==1, sort lwidth(1)), ylab(-.1(.2)1.1) drop bfbinsm2 *** logistic regression in STATA *** gen agemale=agec*male logit bfbin male agec agemale logistic bfbin male agec agemale predict prob *** sensitivity and specificity from 2 by 2 table *** ** choose your cutpoint of "probability of being 1" to make your prediction of either 0 or 1** gen c = .5 gen bfhat = 1 if prob > c replace bfhat = 0 if bfhat == . tab bfbin bfhat *** test for statistical significance *** test agemale test male agemale ** Marginal models accounting for correlation ** ***FIT GEE w/ unstructured*** (will not converge) xtgee bfbin male agec agemale, f(bin) link(logit) corr(unst) xtcorr ***FIT GEE w/ AR(1)*** xtgee bfbin male agec agemale, f(bin) link(logit) corr(ar1) xtcorr ****Fit GEE with uniform corr model **** xtgee bfbin male agec agemale, f(bin) link(logit) corr(exc) xtcorr ****Fit GEE with independence corr model **** xtgee bfbin male agec agemale, f(bin) link(logit) corr(ind) xtcorr ** Compare the Marginal models that use 199 children ** xtgee bfbin male agec agemale, f(bin) link(logit) corr(ind) xtgee bfbin male agec agemale, f(bin) link(logit) corr(exch) ** look at the fit graphically ** * get predicted values from the two models * quietly xtgee bfbin male agec agemale, f(bin) link(logit) corr(exc) nolog robust predict bffitexch label var bffitexch "exchfit" quietly xtgee bfbin male agec agemale, f(bin) link(logit) corr(ind) nolog robust predict bffitind label var bffitind "indfit" sort age * get smoothed curves of each of the sets of predictions * ksm bffitexch agec, lowess bw(.4) ylab(0(.2)1) lwidth(10) gen(exchsm) ksm bffitind agec, lowess bw(.4) ylab(0(.2)1) lwidth(10) gen(indsm) * compare the model fits * twoway (scatter bfbin agec, jitter(4)) (line bfbinsm exchsm indsm agec, sort), ylab(-0.1(.2)1.1) ** decide to use the independent correlation structure with robust SE estimates ** xtgee bfbin male agec agemale, f(bin) link(logit) corr(ind) nolog robust xtgee, eform test agemale test male agemale drop bffitexch bffitind exchsm indsm ** prepare to drop those individuals who were excluded in AR1 fit ** ** save current data ** save "nepal_temp.dta", replace ** reshape to ID the dropped individuals reshape wide bf age agec bfbin bfbinsm agemale prob bfhat, i(id) j(visit) ** ad hoc identification of individuals dropped in AR1 model drop if bfbin2==. drop if bfbin4==. & bfbin5!=. reshape long ** Compare the three Marginal models that use 186 children ** xtgee bfbin male agec agemale, f(bin) link(logit) corr(ind) xtgee bfbin male agec agemale, f(bin) link(logit) corr(exch) xtgee bfbin male agec agemale, f(bin) link(logit) corr(ar1) ** look at the fit graphically ** ** get predicted values from the three models using the robust option ** quietly xtgee bfbin male agec agemale, f(bin) link(logit) corr(ar1) nolog robust predict bffitar1 label var bffitar1 "ar1fit" quietly xtgee bfbin male agec agemale, f(bin) link(logit) corr(exc) nolog robust predict bffitexch label var bffitexch "exchfit" quietly xtgee bfbin male agec agemale, f(bin) link(logit) corr(ind) nolog robust predict bffitind label var bffitind "indfit" * need to generate a new 'data-based' smooth for the 186 children data drop bfbinsm ksm bfbin agec, lowess bw(.4) ylab(0(.2)1) lwidth(10) gen(bfbinsm) sort age * get smoothed curves of each of the sets of predictions * ksm bffitar1 agec, lowess bw(.4) ylab(0(.2)1) lwidth(10) gen(ar1sm) ksm bffitexch agec, lowess bw(.4) ylab(0(.2)1) lwidth(10) gen(exchsm) ksm bffitind agec, lowess bw(.4) ylab(0(.2)1) lwidth(10) gen(indsm) twoway (scatter bfbin agec, jitter(4)) (line bfbinsm ar1sm exchsm indsm agec, sort), ylab(0(.2)1) * exchangeable is the worst fit - use AR1 or independence with robust * ** decide to use the AR1 correlation structure with robust SE estimates ** xtgee bfbin male agec agemale, f(bin) link(logit) corr(ar1) nolog robust xtgee, eform test agemale test male agemale