SOFTWARE FOR ESTIMATING SURVIVAL CURVE WITH DOUBLE-SAMPLING DATA.
Frangakis and Rubin, 2001 Biometrics (with discussion)

1. PURPOSE.
   The software is for a study with the following features:

(i)   GOAL: to estimate a cohort's survival (time to event) distribution.
(ii)  study has longitudinal entry.
(iii) subjects can (i) dropout selectively, or (ii) be administratively
      censored.
(iv)  a representative subset of the dropouts is pursued and their
      data (X,Delta) (see below) are recorded.
(v)   it can be assumed that subjects with different entry times
      are not very different in inherent characteristics such as survival.
      (for details, see the article).
Note: the above double sampling allows us to make no assumptions about how
different dropouts and nondropouts are in their survival.


2. What variables are needed to estimate survival distribution
   Pr(T > ) with the software. 
   (see also definitions in Sections 2.1-2.3 of the article).
   The following variables, each of length equal to the total
   cohort, are needed.

(i)   Robs :  indicator= 1 if subject is not a study dropout.
      (needed for all subjects). 
(ii)  S : indicator=1 if subject is a study dropout and is 
      double sampled (needed for all subjects).
(iii) x.ct : min of (survival T, administrative censoring time C)
      (needed for subjects who did not dropout, i.e with Robs=1,
      and for subjects who dropped out but were double sampled later,
      ie. with S=1. 
(iv)  Delta : indicator =1 if T<C (needed for the same group of subjects
      as in (iii).

For subjects with Robs not 1 and S not 1, there should still be 
fields in x.ct and Delta, (for format reasons), although
those field can have any value, even NA.

NOTE: Although the method in the article works more generally,
this version of this software assumes that all values in x.ct
are distinct. Generally speaking, one could still use this 
software with ties if, at a time x.ct=t* that is shared by n{t*} people,
the user keeps one person with x.ct=t* and redefines in a careful
way x.ct for the remaining n{t*}-1 subjects to be distinct values 
close enough to t*.


3. HOW TO RUN THE SOFTWARE [R in Windows]

(i) Download the R file routds.r and .dll file routds.dll and save them in a
directory.
(ii) Open R from the same directory and type: source("routds.r").
(iii) from R, type
     newestimate=dblsample.surv(x.ct,Delta,Robs,S)


4. OUTPUT.

newestimate will have the following output:

newestimate$lfustatus  :  returns Robs, sorted by x.ct
newestimate$size: 	the total number of failures.
newestimate$failtimes : the times t (failures in the data set),
                     at which the estimator of pr(T>t) gets distinct values.

newestimate$probs: the estimator for pr(T>t) in Frangakis and Rubin (2001), 
		    computed at t in ``failtimes''.
newestimate$se :    the pointwise standard error of the estimator, 
      computed, at t in failtimes, using the delta method on the standard
      error on the corresponding cumulative hazard.
newestimate$lower: the lower point of the approximate 95% confidence
     interval (CI) for Pr(T>t), evaluated at t=failtimes.
     It is evaluated as the receprocal of the exponentiated
     upper limit for the CI of the hazard function.
newestimate$upper: the upper point of the CI in newestimate$lower


NOTE: This algorithm, a slight modification from the original
to increase speed and user-friendliness, has been tested for
the cases described in the article. The authors of the algorithm
are not responsible for any consequences that may result in use
of this file or the algorihtm by any person.