Estimate the survival curves in two treatment groups

Doubly robust estimation of the survival curves in two exposure groups and their difference using the method of Sjolander and Vansteelandt (2017) .

Usage

sjovan_survdiff(
  oformula = NULL,
  ofunc = "survreg",
  oarg = list(),
  cformula = NULL,
  cfunc = "survreg",
  carg = list(),
  eformula = NULL,
  earg = list(),
  method = "DR",
  times = NULL,
  rel.tol = .Machine$double.eps^0.1,
  jacobian.method = "simple",
  se.type = "none",
  data = NULL,
  weights = NULL,
  subset = NULL,
  R = 50,
  parallel = "no",
  cl = NULL,
  ncpus = NULL
)

Arguments

oformula: The outcome formula, the left side should be a call to Surv
ofunc: The model type for the outcome, one of "coxph" or "survreg"
oarg: Arguments passed to ofunc
cformula: The censoring model formula, the left side should be a call to Surv
cfunc: The model type for the censoring model, either "coxph" or "survreg"
carg: Arguments passed to cfunc
eformula: The exposure model formula, which will be fit by logistic regression in a call to glm
earg: Arguments passed to the glm for the exposure model
method: Estimation method, either "IPW" or "DR"
times: Vector of times at which to estimate the survival curves
rel.tol: Convergence tolerance
jacobian.method: Method of computing the jacobian, passed to jacobian
se.type: Method of calculating standard errors, either "none" for no standard errors, "sandwich", or "boot"
data: Data frame in which to find the variables in the model formulas
weights: Vector of case weights
subset: Logical vector
R: Number of bootstrap replicates, used if se.type = "boot"
parallel: Parallel processing for bootstrap, see boot
cl: Optional cluster if parallel processing, see boot
ncpus: Number of cpus to use for parallel processing, see boot

Value

A list with the estimates survival probabilities in each group, their difference, and standard errors, if requested

Details

Following Bai et al. 2013 and Sjölander and Vansteelandt 2017, consider the following estimating equation for $S_x(t) = p\{T(x) > t\}$: $$\sum_{i = 1}^n \left[S_x(t) - \frac{I_{X_i = x} I_{\tilde{T}_i > t}}{\bar{g}(\boldsymbol{Z}_i, \boldsymbol{\alpha})G_c(t, X_i, \boldsymbol{Z}_i)} - \frac{I_{X_i = x} - \bar{g}(\boldsymbol{Z}_i, \boldsymbol{\alpha})}{\bar{g}(\boldsymbol{Z}_i, \boldsymbol{\alpha})} H(t, \boldsymbol{Z}_i, X = x) - \right. \\ \left. \frac{I_{X_i = x} - \bar{g}(\boldsymbol{Z}_i, \boldsymbol{\alpha})}{\bar{g}(\boldsymbol{Z}_i, \boldsymbol{\alpha})} H(t, \boldsymbol{Z}_i, X = x) \int_0^t\frac{d\, M_c(u, \boldsymbol{Z}_i, X_i, \tilde{T}_i, \Delta_i)}{G_c(u, X_i, \boldsymbol{Z}_i)H(u, \boldsymbol{Z}_i, X_i)}\right] = 0, $$ where $\bar{g}(\boldsymbol{Z}_i, \boldsymbol{\alpha}) = g(\boldsymbol{Z}_i, \boldsymbol{\alpha})^x (1 - g(\boldsymbol{Z}_i, \boldsymbol{\alpha}))^{(1-x)}$, $H(t, \boldsymbol{Z}, X)$ is a model for $p\{T > t | \boldsymbol{Z}, X\}$, and $M_c(t, \boldsymbol{Z}, X, \tilde{T}, \Delta)$ is the martingale increment for the censoring distribution. The above is an unbiased estimating equation for $S_x(t)$ if either $H(t, \boldsymbol{Z}, X)$ is correctly specified for both censoring and confounding, i.e., is a correctly specified model for $p\{T(x) > t | \boldsymbol{Z}, X\}$ or both $G_c(u, X, \boldsymbol{Z})$ and $g(\boldsymbol{Z}, \boldsymbol{\alpha})$ are correctly specified for censoring and confounding, respectively. To obtain estimates of the difference in survival probabilities, one must specify models for the unknown functions $g$, $G_c$, and $H$, get estimates of those, plug them into the estimating equations, and solve for $S_x(t)$ under $x \in \{0, 1\}$. In this package, one can use semiparametric Cox models or parametric survival models for the outcome and the censoring distributions, and logistic regression for the propensity score model. Bai et al. 2013 provide an expression for a variance estimator that accounts for the uncertainty due to the estimation of the propensity score $g$ and the censoring distribution $G_c$.

References

Xiaofei Bai, Anastasios A Tsiatis, and Sean~M O'Brien. "Doubly-robust estimators of treatment-specific survival distributions in observational studies with stratified sampling." Biometrics, 69(4):830--839, 2013.

Sjölander, Arvid, and Stijn Vansteelandt. "Doubly robust estimation of attributable fractions in survival analysis." Statistical methods in medical research 26(2):948-969, 2017.

Examples

df <- rotterdam
df$time <- pmin(df$rtime, df$dtime) / 365.25
df$status <- ifelse(df$recur == 1 | df$death == 1, 1, 0)
df$censor <- 1 - df$status
drFit <-
  sjovan_survdiff(
    oformula = Surv(time, status) ~ chemo + year + age + meno +
      size + factor(grade) + nodes + pgr + er + hormon,
    ofunc = "survreg",
    cformula = Surv(time, censor) ~ chemo + year + age,
    cfunc = "survreg",
    eformula = chemo ~ year + age + meno + size +
      factor(grade) + nodes + pgr + er + hormon,
    method = "DR",
    times = c(2.5, 5, 7.5),
    se.type = "sandwich",
    data = df
  )
drFit
#> $est.S1
#> [1] 0.7661741 0.6032550 0.5224925
#> 
#> $se.S1
#> [1] 0.03440273 0.02906199 0.14520126
#> 
#> $est.S0
#> [1] 0.7190130 0.5515392 0.4670844
#> 
#> $se.S0
#> [1] 0.07008758 0.07427899 0.10500988
#> 
#> $est.diff
#> [1] 0.04716109 0.05171581 0.05540801
#> 
#> $se.diff
#> [1] 0.06863352 0.07611256 0.14153032
#>