Estimate the survival probabilities in two treatment groups at several times
Source:R/pseudoobs.R
pseudo_survdiff.Rd
Doubly robust estimation of the survival probabilities in two exposure groups
and their difference using the method of Wang (2018). By default it is
assumed that censoring is completely independent, but covariate-dependence
can be modeled by either specifying the right side of cformula
and cens.method
which
will be passed to cumincglm.
Usage
pseudo_survdiff(
oformula,
cformula = ~1,
cens.method = NULL,
eformula,
times,
data,
weights = NULL,
subset = NULL,
R = 50,
parallel = "no",
cl = NULL,
ncpus = NULL
)
Arguments
- oformula
The outcome formula, the left side should be a call to Surv
- cformula
The censoring model formula, a one sided formula with no outcome
- cens.method
Method to use for the censoring model, passed to cumincglm
- eformula
The exposure model formula, which will be fit by logistic regression in a call to glm
- times
Vector of times at which to estimate the survival probabilities
- data
Data frame in which to find the variables in the model formulas
- weights
Vector of case weights
- subset
Logical vector
- R
Number of bootstrap replicates for standard error
- parallel
Parallel processing for bootstrap, see boot
- cl
Optional cluster if parallel processing, see boot
- ncpus
Number of cpus to use for parallel processing, see boot
Value
A list with estimated survival probabilities in each group, their difference, and estimated standard errors.
Details
As presented in Wang 2018, let the jackknife pseudo observation for the $i$th subject be $$\hat{S}^i(t) = n \hat{S}(t) - (n - 1)\hat{S}^{-i}(t),$$ where \(\hat{S}(t)\) is an estimator of the survival probability at time \(t\) using all \(n\) subjects, and \(\hat{S}^{-i}(t)\) is an estimator of the survival probability at time \(t\) leaving the \(i\)th subject out of the estimation procedure. The assumption about censoring is implied by the way these pseudo observations are computed, i.e., these could be inverse probability of censoring weighted estimators where the censoring distribution may depend on covariates, nonparametric estimators stratified on covariates, or simply the Kaplan-Meier estimators if censoring is assumed to be independent of covariates. We then use a working outcome model for the conditional survival probability: $$\log[-\log\{v_i(X_i)\}]=\log[-\log\{p(T > t | X_i, \boldsymbol{Z}_i)\}] = \beta^\star X_i+h(X_i,\boldsymbol{Z}_i;\boldsymbol{\gamma}^\star),$$ where \(\log(-\log(x))\) is the link function that coincides with how a proportional hazards model fits survival at this time point. It is of note that, unlike the Cox model, this estimator does not assume anything about the survival function prior to time point \(t\). The parameters in the above model are estimated by solving the generalized estimating equations with \(\hat{S}^i(t)\) used as the outcome variable. Let \(\hat{v}_i(x) = \exp[-\exp\{\hat{\beta}^\star x+h(x,\boldsymbol{Z}_i;\hat{\boldsymbol{\gamma}}^\star)\}]\). A logistic regression model for the propensity of the exposure is assumed and estimated. The doubly robust estimator of the difference in survival probability at time \(t\) is $$\frac{1}{n}\sum_{i = 1}^n\left[ \frac{X_i \hat{S}^i(t) - (X_i - g(\boldsymbol{Z}_i, \hat{\boldsymbol{\alpha}}))\hat{v}_i(1)}{g(\boldsymbol{Z}_i, \hat{\boldsymbol{\alpha}})} - \frac{(1 - X_i) \hat{S}^i(t) + (X_i - g(\boldsymbol{Z}_i, \hat{\boldsymbol{\alpha}}))\hat{v}_i(0)}{1 - g(\boldsymbol{Z}_i, \hat{\boldsymbol{\alpha}})}\right].$$ Although Wang proposed a variance estimator, it was found to be biased upward, and they suggest instead using bootstrap variance estimation. Thus, we implement the bootstrap variance estimator.
References
Wang, J. A simple, doubly robust, efficient estimator for survival functions using pseudo observations. Pharmaceutical Statistics. 2018; 17: 38–48. https://doi.org/10.1002/pst.1834
Examples
df <- rotterdam
df$time <- pmin(df$rtime, df$dtime) / 365.25
df$status <- ifelse(df$recur == 1 | df$death == 1, 1, 0)
df$censor <- 1 - df$status
drFit <-
pseudo_survdiff(
oformula = Surv(time, status) ~ chemo + year + age + meno +
size + factor(grade) + nodes + pgr + er + hormon,
cformula = ~ chemo + meno + size,
eformula = chemo ~ year + age + meno + size +
factor(grade) + nodes + pgr + er + hormon,
times = c(2.5, 5, 7.5),
data = df
)
drFit
#> $est.S1
#> [1] 0.7814539 0.6209512 0.5389184
#>
#> $se.S1
#> [1] 0.02069071 0.02328108 0.02573691
#>
#> $est.S0
#> [1] 0.7264196 0.5509058 0.4616483
#>
#> $se.S0
#> [1] 0.01198071 0.01080776 0.01184234
#>
#> $est.diff
#> [1] 0.05503432 0.07004544 0.07727005
#>
#> $se.diff
#> [1] 0.02624473 0.02512435 0.02745386
#>