Software for fitting some models of principal stratification
Frangakis, CE, and Rubin,  DB (2002).  Principal Stratification in Causal Inference.  (Biometrics[58,  21-29, Full text - PDF]). The authors have no responsibility for any  consequences of use of this software.
 
SOFTWARE  FILES:   preffects.tar.gz

1. WHAT THIS SOFTWARE DOES, and WHY THE NEED FOR THE NEW METHODS.
The software is written for the above paper. The paper studies situations where we want causal effects adjusted for post-treatment variables, which includes trials with noncompliance,  trials with surrogate endpoints, censoring by death, and many others. The paper shows that usual methods of surrogacy and of adjustment do not have causal interpretation as surrogates because they condition on the post-treatment variable which is affected by treatment. The paper proposes new treatment comparions that adjust for post-treatmentvariables and are always causal effects (``principal stratification'').

 This version of the software is for a class of estimation methods
of such comparisons and with:

  1. a binary treatment *assignment* z, ignorable given covariates x;
  2. a binary post-treatment variable d(z);
  3. a continuous outcome y(z);
  4. a goal being  to estimate treatment effects conditionally on the principal stratification {d(z), z=0,1}.
In particular, the data required and the model fitted model are described below in an example of noncompliance, where d(z) is the actual treatment received. Alternatively, d(z) can be a biomarker (surrogate endpoint) measured after treatment, y(z) can be the main endpoint (mortality), and interest may lie of the degree to which  ``causal effects of treatment z on outcome y occur together with effects of treatment z on the biomarker d''. To address these questions, the software can estimate the ``Associative'' and ``Dissociative'' Effects (Section 5.2 of Frangakis and Rubin, 2002, Biometrics).
 

2. TO INSTALL THE SOFTWARE. The current version is for unix with S-plus5. Save the Code file (preffects.tar.gz)
in a directory. From that directory, type:

gunzip preffects.tar.gz
tar xf preffects.tar

This creates a subdirectory called preffects, which contains the files: batchin.s    prel.s     routines1.s  routines2.s Go to the subdirectory ``preffects''. Open Splus5 from there. To proceed, we must have the data described below.

3. DATA NEEDED AND QUANTITIES ESTIMATED.
From here on, we assume the following data exist in Splus5: (note that unless otherwise indicated, storage mode must be double precision).

  1.   nobs: number of subjects in data set: careful,this must be of storage mode ``integer''; to make sure, use the command:

    ``nobs <- as.integer(nobs)''

  2.   y : continuous outcome vector of length nobs.
  3. z : randomized assignment vector, 0 for standard, 1 for new  treatment arm (length of z= nobs).
  4. d : actual treatment received: 0 for standard, 1 for new (length of d= nobs).
  5. nxc1: integer, see xc1
  6. xc1: matrix of dimension (nobs by nxc1). This is the design matrix for first part of the S (principal strata) model:

    logit(pr(Si=never-taker | xc1[i,] ))=xc1[i,] *betac1 (first column of xc1 is typically 1)

  7. nxc2: integer, see xc2
  8. xc2: matrix of dimension (nobs by nxc2). This is the design matrix for the remaining part of the S model:

    logit(pr(Si=complier | Si is not never taker, xc2[i,] ))=xc2[i,] *betac2; (first column of xc2 is typically 1)

  9. nxy: integer, see xy
  10. xy: matrix of dimension (nobs by nxy). This should be chosen based on its use in the y-model as described now. The model for the potential outcome Yi(z) is:

    pr( Yi(z) | Si=s, xy[i,]=xyf)= normal with mean

    linky.v1(xyf,s,z) * betay  and variance exp(betassqy)

    currently, the definition of linky.v1 (in routines2.s) is

    linky.v1(xyf,s,z)=[xyf,c2,c3,zt*c2] ;

    so that the last parameter is the treatment effect  for the principal stratum of compliers= {i: Di(0)=Di(1)=1}.


 
The user can change the link function to allowinteractions. For example, the link function linky.v2 in routines.s is [xyf,xyf*c2,xyf*c3,zt*c2]. Also, if we use a link function of the form [xyf,c2,c3, zt*c2,zt*c3], the coefficient of zt*c3 is, for always-takers, the effect of assignment on outcome that is not an effect of actual treatment on outcome. In that case,
the user should make sure that the parameters are estimable, by checking if the observed strata provide enough degrees of freedom.

(note, with large sample sizes, the user can put xc1=xc2=xy; the user may still try that with smaller samples,  but if some parameters are poorly estimated, the user may reduce the dimension of some of the design matriced; it is however, recommended that xc1,xc2, and xy have one more column in addition to the ``1'' column. If they only have the ``1'' column, the user should make sure these objects' mode is kept ``as.matrix'' (not just as.vector) throughout the operations.)

EXAMPLE.
Suppose we want to fit two covariates, age and gender, in an additive way in the two model for principal strata, and the model for outcomes  given principal strata. Then, we should use: xc1_cbind(1,age,gender); xc2_xc1; xy_xc1; and the dimensions nxc1, nxc2, nxy, should each equal to as.integer(3).

4. MORE ON THE QUANTITIES THAT ARE ESTIMATED.
The parameter of the model is beta=[betac1, betac2, betay, betassqy] (estimated with maximum likelihood). Generally: (i) we should model covariates, to increase precision, but we should note that (ii) then, the original scale of the parameters becomes less interpretable, e.g. when we include interactions. Because of both (i) and (ii),  some averaged estimands that **result** from the model, are more of interest. The function estimands.exercise calculates theta=an 8-dim
 vector of estimands, where theta=theta(beta,Data) and which comprises the following: (see also batchin.s).

 theta[1] = E(Y(z=0)| never-takers)
 theta[2] = E(Y(z=0)| always-takers)
 theta[3] = E(Y(z=0)| compliers)
 theta[4] = E(Y(z=1)| compliers)
 theta[5] = theta[4]-theta[3], the average effect on compliers
 theta[6] = pr(Si=never-taker)
 theta[7] = pr(Si=compler)
 theta[8] = pr(Si=always-taker)

 Note, the function estimands.exercise automatically
 adapts to whatever xc1,xc2,xy, and linky.v1 (see prel.s) are.
 so the user may feel free to change the design matrices in prel.s
 without changing the function estimands.exercise.

5. HOW TO RUN THE SOFTWARE.
Read the comments in batchin.s.
Then,
 

  1. from the unix prompt of the directory `preffects'', type

  2. Splus5 BATCH prel.s batchout

    and, when the job is done, check the file batchout (it should contain no error messages).

  3. from the unix prompt, type


Splus5 BATCH batchin.s batchout

and when the job is done, go in Splus5 and look at the object

citheta

This contains point estimates, and lower and upper 95% CI limits for the 8 estimands of Section 4 above.

Alternatively to (2), that the user can enter Splus5 and type manually each command of the file batchin.s, for best control of the sequence of procedures.

We have not encountered any bugs, but, if any such problem arises, we would appreciate if the user lets us know.


Return to Dr. Frangakis's  Home Page