Estimate the survival probabilities in two treatment groups at several times

Doubly robust estimation of the survival probabilities in two exposure groups and their difference using the method of Blanche et al 2023. By default it is assumed that censoring is completely independent, but covariate-dependence can be modeled by either specifying the right side of cformula which will be passed to a Cox model for censoring, or estimating censoring weights externally and passing them to the cens.weights argument.

Usage

blanche_survdiff(
  oformula,
  cformula = ~1,
  cens.weights = NULL,
  eformula,
  times,
  data,
  weights = NULL,
  subset = NULL
)

Arguments

oformula: The outcome formula, the left side should be a call to Event
cformula: The censoring model formula, a one sided formula with no outcome
cens.weights: A vector of inverse probability of censoring weights. If not NULL, then cformula will be ignored.
eformula: The exposure model formula, which will be fit by logistic regression in a call to glm
times: Vector of times at which to estimate the survival probabilities
data: Data frame in which to find the variables in the model formulas
weights: Vector of case weights
subset: Logical vector

Value

A list with estimated survival probabilities in each group, their difference, and estimated standard errors.

Details

As presented in Blanche et al. 2023, we model the causal survival probabilities $p\{T(x)>t\}$. We assume there is a way to correctly specify for confounding a model for $p\{T(x)\leq t|Z\}$ as in a logistic regression model $$p\{T \leq t | X, \boldsymbol{Z}\} = Q_t(X, \boldsymbol{Z}; \beta^\dagger, \boldsymbol{\gamma}^\dagger) = \frac{\exp(\beta^\dagger X+h(X,\boldsymbol{Z};\boldsymbol{\gamma}^\dagger))}{1 + \exp(\beta^\dagger X+h(X,\boldsymbol{Z};\boldsymbol{\gamma}^\dagger))}.$$ The working propensity model for exposure is a logistic regression model. In addition to these models, we model the censoring distribution as $p\{C > u | X, \boldsymbol{Z}\} = G_c(u, X, \boldsymbol{Z}).$ If cens.weights is not NULL, then we estimate $\hat{G}_c(u, X, \boldsymbol{Z})$ by a Cox model. This can be done externally by Aalen's additive hazard model, nonparametric estimators stratified on covariates, or if censoring is assumed independent of covariates, $G_c$ may be estimated with the Kaplan-Meier estimator or any other nonparametric estimator and passed in the cens.weights argument. Let the estimated inverse probability of censoring weights be $$\hat{U}_i(t) = \frac{I_{\tilde{T_i} \leq t} \Delta_i}{\hat{G}_c(\tilde{T}_i, X_i, \boldsymbol{Z}_i)} + \frac{I_{\tilde{T_i}> t}}{\hat{G}_c(t, X_i, \boldsymbol{Z}_i)}.$$ Then let $\hat{\beta}^\dagger, \hat{\boldsymbol{\gamma}}^\dagger$ denote the parameter estimates resulting from the inverse probability of censoring weighted, but propensity score unweighted, score equations for the logistic model estimated via maximum likelihood among the subset of individuals who either had the event before time t or remained under observation until at least time t. Then the doubly robust estimator of the causal event probability (one minus survival) under exposure level $x = 1$ is $$\frac{1}{n}\sum_{i = 1}^n W(1, \boldsymbol{Z}_i; \hat{\boldsymbol{\alpha}})\hat{U}_i(t)\left(I_{T_i \leq t} - Q_k(1, \boldsymbol{Z}_i, \hat{\beta}^\dagger, \hat{\boldsymbol{\gamma}}^\dagger)\right) + \frac{1}{n}\sum_{i = 1}^n Q_k(1, \boldsymbol{Z}_i, \hat{\beta}^\dagger, \hat{\boldsymbol{\gamma}}^\dagger), $$ where the term $I_{T_i \leq t}$ is defined to be 0 for individuals censored before time $t$, but it is otherwise fully observed. Similarly, we can substitute in $x = 0$ and take the difference between them to obtain estimates of differences in survival probabilities. A standard error estimator is suggested in Blanche et al. 2023 and implemented in the R package logitIPCWATE. This function is basically a wrapper to that function to be consistent with the other estimators.

References

Blanche, Paul Frédéric, Anders Holt, and Thomas Scheike. "On logistic regression with right censored data, with or without competing risks, and its use for estimating treatment effects." Lifetime data analysis 29(2):441-482, 2023.

Examples

df <- rotterdam
df$time <- pmin(df$rtime, df$dtime) / 365.25
df$status <- ifelse(df$recur == 1 | df$death == 1, 1, 0)
df$censor <- 1 - df$status
df$chemo <- factor(df$chemo)
drFit <-
  blanche_survdiff(
    oformula = Event(time, status) ~ chemo + year + age + meno +
      size + factor(grade) + nodes + pgr + er + hormon,
    cformula = ~ chemo + meno + size,
    eformula = chemo ~ year + age + meno + size +
      factor(grade) + nodes + pgr + er + hormon,
    times = c(2.5, 5, 7.5),
    data = df
  )
drFit
#> $est.S1
#> [1] 0.7812813 0.6184004 0.5328929
#> 
#> $se.S1
#> [1] 0.02098236 0.02641496 0.02896027
#> 
#> $est.S0
#> [1] 0.7225998 0.5496811 0.4618080
#> 
#> $se.S0
#> [1] 0.01190859 0.01045986 0.01074570
#> 
#> $est.diff
#> [1] 0.05868142 0.06871926 0.07108498
#> 
#> $se.diff
#> [1] 0.02405841 0.02818684 0.03073858
#>