**Logistic regression for binary data, part 2** **NEPAL DATA** set more off cd "C:\Documents and Settings\Sandrah Eckel\Desktop\LDA lab11" use "nepal.dta", clear ************************************** ** Dataset prep work from lab 10 ***** ************************************** ** drop extra variables ** drop age2 age3 age4 t2 ** drop other variables we're not using in this analysis drop wt ht arm day month year died alive mage lit ** gen visit number variable to use as our time variable ** sort id age by id: gen visit=_n tab visit xtset id visit xtdes ***Explore our outcome of breastfeeding*** codebook bf drop if bf==. xtdes ***Generate binary breastfeeding variable*** *** information about the 3 categories is contained in the readme file *** *** combine 2 groups to gen 0-1 indicator of breast feeding *** gen bfbin=1 if bf==1|bf==2 replace bfbin=0 if bfbin==. tab bf bfbin ** Explore the binary breastfeeding variable ** xttab bfbin xttrans bfbin ** Explore our covariates of interest ** codebook age ** create a centered age variable gen agec = age-37.8194 codebook sex * generate a binary indicator for male gender * gen male=(sex==1) tab sex male drop sex * define centered age and male gender interaction term * gen agemale=agec*male ************************************** ** End of Dataset prep from lab 10 *** ************************************** ***************************************************** ** Random intercept model with binary outcome ******* ************* ** xtlogit ** ************* * report log odds xtlogit bfbin agec, i(id) nolog * report OR xtlogit bfbin agec, i(id) nolog or * check for adequate fitting of the model quadchk, nooutput * relative difference > 10^-2 (1%) (arbitrary STATA rule of thumb) * so refit using > # of intpoints (default was 12) xtlogit bfbin agec, i(id) or intpoints(18) nolog quadchk, nooutput * maybe increase intpoints again or try different quadrature * method * use quadrature method ghermite (default is intmethod(mvaghermite)) xtlogit bfbin agec, i(id) or intpoints(18) intmethod(aghermite) nolog quadchk, nooutput * even worse! use default intmethod and increase intpoints (a lot!) xtlogit bfbin agec, i(id) or intpoints(100) nolog quadchk, nooutput ** NOTE: to the best of my knowledge xtlogit does not give ** estimates of the random intercepts - so we use gllamm ** to fit the same random intercept model and then ** we'll be able to get the estimated random intercepts ************* ** gllamm ** ************* ** install gllamm for the first time ** *ssc describe gllamm ** update gllamm (replace previous version) *ssc install gllamm, replace gllamm bfbin agec, i(id) l(logit) f(binom) ** get OR instead of log(OR) ** gllamm, eform * get the fitted log odds predict fittedLO, xb * get the Empirical Bayes predictions of the random intercepts * using gllapred (EB estimates stored as ebRIm1)** gllapred ebRI, u gen fitted_avgp = exp(fittedLO)/( 1 + exp(fittedLO) ) gen fitted_indp = exp(fittedLO + ebRIm1)/( 1 + exp(fittedLO + ebRIm1) ) * fit our 3 GEE models again ***FIT GEE w/ AR(1)*** quietly xtgee bfbin agec, f(bin) link(logit) corr(ar1) robust predict fitted_geear1, mu label var fitted_geear1 "gee_ar1" ****Fit GEE with uniform corr model **** quietly xtgee bfbin agec, f(bin) link(logit) corr(exc) robust predict fitted_geeunif, mu label var fitted_geeunif "gee_unif" ****Fit GEE with independence corr model **** quietly xtgee bfbin agec, f(bin) link(logit) corr(ind) robust predict fitted_geeind, mu label var fitted_geeind "gee_ind" * smooth the observed data * ksm bfbin agec, lowess bw(.4) ylab(0(.2)1) lwidth(10) gen(bfbinsm) * plot the gee and RI model fits * twoway (scatter bfbin agec, jitter(4)) (line bfbinsm fitted_geear1 fitted_geeunif fitted_geeind fitted_avgp agec, sort pstyle(p1)), ylab(0(.2)1) * plot the RI 'average' prob and the individual probs twoway (line fitted_indp agec, c(L) pstyle(p15)) (line fitted_avgp agec , sort clwidth(thick)) * plot the RI 'average' prob and the individual probs and overlay a population average (marginal model fit) twoway (line fitted_indp agec, c(L) pstyle(p15)) (line fitted_avgp agec , sort clwidth(thick)) (line fitted_geear1 agec , sort clwidth(thick)) ** transition model ** ** include a previous outcome as a covariate ** ** generate one year lag of y ** sort id visit by id: gen bfbin_lag1 = bfbin[_n-1] ** assume conditional independence conditioned on the previous outcome (first order Markov chain) ** logit bfbin agec bfbin_lag1 logistic bfbin agec bfbin_lag1 ** allow for some correlation structure even when conditioning on previous outcome ** xtgee bfbin agec bfbin_lag1, nolog f(bin) l(logit) corr(unst) robust xtcorr ** compare to assuming independence when conditioning on previous outcome (equiv to logit command) ** xtgee bfbin agec bfbin_lag1, nolog f(bin) l(logit) corr(ind) xtcorr ** is a simpler model appropriate (without using lagy1 as covariate)? ** note that the coefficient on lagy1 is quite statistically significantly ** different from zero in all the above models qic bfbin agec male agemale bfbin_lag1, nolog f(bin) l(logit) corr(unst) robust qic bfbin agec male agemale bfbin_lag1, nolog f(bin) l(logit) corr(ar1) robust qic bfbin agec male agemale bfbin_lag1, nolog f(bin) l(logit) corr(exch) robust qic bfbin agec male agemale bfbin_lag1, nolog f(bin) l(logit) corr(ind) robust xtgee bfbin agec male agemale bfbin_lag1, nolog f(bin) l(logit) corr(exch) robust xtcorr * FINALLY...is there a gender effect? xtgee bfbin agec male agemale bfbin_lag1, nolog f(bin) l(logit) corr(unst) robust eform xtcorr test male agemale ** Check out the huge OR on bfbin_lag1 !!! **