Estimate the survival probabilities in two treatment groups at several times

Doubly robust estimation of the survival probabilities in two exposure groups and their difference using the method of Wang (2018). By default it is assumed that censoring is completely independent, but covariate-dependence can be modeled by either specifying the right side of cformula and cens.method which will be passed to cumincglm.

Usage

pseudo_survdiff(
  oformula,
  cformula = ~1,
  cens.method = NULL,
  eformula,
  times,
  data,
  weights = NULL,
  subset = NULL,
  R = 50,
  parallel = "no",
  cl = NULL,
  ncpus = NULL
)

Arguments

oformula: The outcome formula, the left side should be a call to Surv
cformula: The censoring model formula, a one sided formula with no outcome
cens.method: Method to use for the censoring model, passed to cumincglm
eformula: The exposure model formula, which will be fit by logistic regression in a call to glm
times: Vector of times at which to estimate the survival probabilities
data: Data frame in which to find the variables in the model formulas
weights: Vector of case weights
subset: Logical vector
R: Number of bootstrap replicates for standard error
parallel: Parallel processing for bootstrap, see boot
cl: Optional cluster if parallel processing, see boot
ncpus: Number of cpus to use for parallel processing, see boot

Value

A list with estimated survival probabilities in each group, their difference, and estimated standard errors.

Details

As presented in Wang 2018, let the jackknife pseudo observation for the $i$th subject be $$\hat{S}^i(t) = n \hat{S}(t) - (n - 1)\hat{S}^{-i}(t),$$ where $\hat{S}(t)$ is an estimator of the survival probability at time $t$ using all $n$ subjects, and $\hat{S}^{-i}(t)$ is an estimator of the survival probability at time $t$ leaving the $i$th subject out of the estimation procedure. The assumption about censoring is implied by the way these pseudo observations are computed, i.e., these could be inverse probability of censoring weighted estimators where the censoring distribution may depend on covariates, nonparametric estimators stratified on covariates, or simply the Kaplan-Meier estimators if censoring is assumed to be independent of covariates. We then use a working outcome model for the conditional survival probability: $$\log[-\log\{v_i(X_i)\}]=\log[-\log\{p(T > t | X_i, \boldsymbol{Z}_i)\}] = \beta^\star X_i+h(X_i,\boldsymbol{Z}_i;\boldsymbol{\gamma}^\star),$$ where $\log(-\log(x))$ is the link function that coincides with how a proportional hazards model fits survival at this time point. It is of note that, unlike the Cox model, this estimator does not assume anything about the survival function prior to time point $t$. The parameters in the above model are estimated by solving the generalized estimating equations with $\hat{S}^i(t)$ used as the outcome variable. Let $\hat{v}_i(x) = \exp[-\exp\{\hat{\beta}^\star x+h(x,\boldsymbol{Z}_i;\hat{\boldsymbol{\gamma}}^\star)\}]$. A logistic regression model for the propensity of the exposure is assumed and estimated. The doubly robust estimator of the difference in survival probability at time $t$ is $$\frac{1}{n}\sum_{i = 1}^n\left[ \frac{X_i \hat{S}^i(t) - (X_i - g(\boldsymbol{Z}_i, \hat{\boldsymbol{\alpha}}))\hat{v}_i(1)}{g(\boldsymbol{Z}_i, \hat{\boldsymbol{\alpha}})} - \frac{(1 - X_i) \hat{S}^i(t) + (X_i - g(\boldsymbol{Z}_i, \hat{\boldsymbol{\alpha}}))\hat{v}_i(0)}{1 - g(\boldsymbol{Z}_i, \hat{\boldsymbol{\alpha}})}\right].$$ Although Wang proposed a variance estimator, it was found to be biased upward, and they suggest instead using bootstrap variance estimation. Thus, we implement the bootstrap variance estimator.

References

Wang, J. A simple, doubly robust, efficient estimator for survival functions using pseudo observations. Pharmaceutical Statistics. 2018; 17: 38–48. https://doi.org/10.1002/pst.1834

Examples

df <- rotterdam
df$time <- pmin(df$rtime, df$dtime) / 365.25
df$status <- ifelse(df$recur == 1 | df$death == 1, 1, 0)
df$censor <- 1 - df$status
drFit <-
  pseudo_survdiff(
    oformula = Surv(time, status) ~ chemo + year + age + meno +
      size + factor(grade) + nodes + pgr + er + hormon,
    cformula = ~ chemo + meno + size,
    eformula = chemo ~ year + age + meno + size +
      factor(grade) + nodes + pgr + er + hormon,
    times = c(2.5, 5, 7.5),
    data = df
  )
drFit
#> $est.S1
#> [1] 0.7814539 0.6209512 0.5389184
#> 
#> $se.S1
#> [1] 0.02069071 0.02328108 0.02573691
#> 
#> $est.S0
#> [1] 0.7264196 0.5509058 0.4616483
#> 
#> $se.S0
#> [1] 0.01198071 0.01080776 0.01184234
#> 
#> $est.diff
#> [1] 0.05503432 0.07004544 0.07727005
#> 
#> $se.diff
#> [1] 0.02624473 0.02512435 0.02745386
#>