Michael Sachs
Once you have an estimate of \(S(t) = P(T > t) = 1 - F(t)\), there are some other interesting summary measures of the distribution of the time to event
Let \(\xi_p\) denote the value that satisfies \(F(\xi_p) = p\) for \(p \in [0, 1]\). We can define \(\hat{\xi}_p\) to be the smallest value \(x\) for which \(\hat{S}(x) \leq 1 - p\).
The asymptotic distribution of \(\hat{\xi}_p\) has a variance that depends on the density of \(T\), which can be shown using the functional delta method for \(\phi(F) = F^{-1}\).
Instead, confidence limits for \(\xi_p\) can be determined by applying the same functional to the confidence limits of the survival curve:
## inst time status age sex ph.ecog ph.karno pat.karno meal.cal wt.loss
## 1 3 306 2 74 1 1 90 100 1175 NA
## 2 3 455 2 68 1 0 90 90 1225 15
## 3 3 1010 1 56 1 0 90 90 NA 15
## 4 5 210 2 57 1 1 90 60 1150 11
## 5 1 883 2 60 1 0 100 90 NA 0
## 6 12 1022 1 74 1 1 50 80 513 0
sfit <- survfit(Surv(time, status == 2) ~ 1,
data = lung)
plot(sfit)
xi.5 <- sfit$time[min(which(sfit$surv < .5))]
xi.5.low <- sfit$time[min(which(sfit$lower < .5))]
xi.5.hi <- sfit$time[min(which(sfit$upper < .5))]
segments(c(0, xi.5.low, xi.5, xi.5.hi),
c(.5, .5, .5, .5),
c(xi.5.hi, xi.5.low, xi.5, xi.5.hi),
c(0.5, 0, 0, 0), lty = 3)
## quantile.50 lower.50 upper.50
## 310 285 363
## [1] "310 (285 to 363)"
The mean survival time is \[ \mu = \int_0^\infty S(t) dt, \] but the value depends on the right-hand tail of the distribution. With censoring, we don’t have a good estimate of that right hand tail beyond the last observed event time.
Instead, the restricted mean is used: for a fixed \(t\) \[ \mu_t = \int_0^tS(u) du \] and \(\hat{\mu}_t\) is estimated by plugging in \(\hat{S}\), i.e., the area under the survival curve.
Asymptotically, the restricted mean is approximately normal with variance that may be estimated by \[ \sum_{T_j \leq t} \frac{(\hat{\mu}_t - \hat{\mu}_{T_j})^2}{Y(T_j)^2}. \]
Consider the competing risks setting, with \(T_i\) the time to the first event, and \(\delta_i\) the event indicator.
If we denote \(F_j(t) = P(T_i < t, \delta_i = j)\), and \(S(t)\) the overall survival function, then due to the relationship \[ S(t) + \sum_j F_j(t) = 1, \] we have \[ t^* - \int_0^{t^*}S(t) dt = \sum_j \int_0^{t^*}F_j(t) dt. \] Hence for \(t^*\) a fixed time we have \[ \int_0^{t^*} P(T_i < u, \delta_i = k) \, du = E(t^* - \min\{T_i, t^*\})1\{\delta_i = k\}) \] This is interpreted as the expected lifetime lost due to cause \(k\) up to time \(t^*\).
See Andersen PK. Decomposition of number of life years lost according to causes of death. Statistics in medicine. 2013 Dec 30;32(30):5278-85.
Suppose \(N_i(t)\) counts the number of observed events up to time \(t\) for subject/cluster \(i\), where there may be more than one event. This is usually subject to censoring, so we have the at risk process \(Y_i(t)\) such that \(N_i(t) = \int Y_i(s) d\tilde{N}_i(s)\) where \(\tilde{N}_i\) is the fully observed counting process.
We have an intensity process \[ E(dN_i(t) | \mathcal{F}_{t-}) = \lambda_i(t) dt \] such that \(N_i = \int \lambda_i(s) \, ds + M_i\), \(M_i\) a martingale.
We can also have a rate model or marginal model \[ r_i(t) dt = E(d\tilde{N}_i(t) | \mathcal{E}_{t-}), \] where \(\mathcal{E}_{t-}\) contains information on external covariates, but not internal covariates nor the history of the counting process (See ABG page 132).
When conditioning on \(\mathcal{E}_{t-}\), we do not get the Doob-Meyer decomposition, and to progress we must assume \[ E(dN_i(t) | Y_i(t), \mathcal{E}_{t-}) = Y_i(t) E(d\tilde{N}_i(t) | \mathcal{E}_{t-}), \] which is stronger than the independent censoring assumption.
The Nelson-Aalen estimator for the cumulative hazard is defined \[ \hat{A}(t) = \int_0^t \frac{dN(u)}{Y(s)}, \] but with recurrent events, we have different possible ways of aggregating the processes. \(N(t)\) will usually be the sum of the individual counting processes (i.e., total number of events), and if \(Y(t)\) indicates for all observable processes, then we have an estimate of the cumulative mean function.
See Ghosh D, Lin D (2000) Nonparametric analysis of recurrent events and death. Biometrics 56:554–562 for more details.
In ABG, example 8.2, they consider 27 individuals observed sleeping for one night. The outcome is the number of awakenings during the night. They consider three ways of defining the at-risk process:
They note that \(Y(t) =Y_1(t) + Y_2(t)\), and hence the three different cumulative mean functions can be estimated from the additive model for the rate:
\[ r_i(t) = \beta_1(t)I_i(t) + \beta_2(t)(1 - I_i(t)), \] where \(I_i(t)\) is 1 if the number of awakenings is greater than the mean for all processes, and 0 otherwise.
We can assume a relative model \[ Y_i(t) \alpha_0(t)r(\beta, Z_i(t)), \] where the key distinction is that the covariate processes \(Z_i(t)\) may or may not contain information about the history of the counting processes. If it does, then the model is semi-Markov, and if not, then the model is Markov.
Andersen and Gill (1982) showed that the Cox partial likelihood with recurrent events works, in the sense that maximizing it leads to consistent and asymptotically normal estimates of the coefficients.
In the ABG book, they distinguish between these two cases as marginal versus dynamic.
Additive regression models also work in the same way, but in the book the discuss the problem of dynamic covariates leading to singularities in the hat matrix.
## id center random treat sex age height weight inherit
## 1 1 Scripps Institute 1989-06-07 rIFN-g female 12 147 62.0 autosomal
## 2 1 Scripps Institute 1989-06-07 rIFN-g female 12 147 62.0 autosomal
## 3 1 Scripps Institute 1989-06-07 rIFN-g female 12 147 62.0 autosomal
## 4 2 Scripps Institute 1989-06-07 placebo male 15 159 47.5 autosomal
## 5 2 Scripps Institute 1989-06-07 placebo male 15 159 47.5 autosomal
## 6 2 Scripps Institute 1989-06-07 placebo male 15 159 47.5 autosomal
## steroids propylac hos.cat tstart enum tstop status
## 1 0 0 US:other 0 1 219 1
## 2 0 0 US:other 219 2 373 1
## 3 0 0 US:other 373 3 414 0
## 4 0 1 US:other 0 1 8 1
## 5 0 1 US:other 8 2 26 1
## 6 0 1 US:other 26 3 152 1
## nonparametric
cgdsurv <- survfit(Surv(tstart, tstop, status) ~ treat, cgd, id=id)
plot(cgdsurv, cumhaz=TRUE, col=1:2, conf.times=c(100, 200, 300, 400),
xlab="Days since randomization", ylab="Cumulative hazard")
## additive version
cdgsurv2 <- aareg(Surv(tstart, tstop, status) ~ treat, data = cgd)
plot(cumsum(cdgsurv2$coefficient[, 1]) ~ cdgsurv2$times, type = "s")
lines(cumsum(cdgsurv2$coefficient[, 1] + cdgsurv2$coefficient[, 2])~
cdgsurv2$times, type = "s", col = "red")
## regression
#marginal model
coxph(Surv(tstart, tstop, status) ~ treat + sex + age, data = cgd,
id = id)
## Call:
## coxph(formula = Surv(tstart, tstop, status) ~ treat + sex + age,
## data = cgd, id = id)
##
## coef exp(coef) se(coef) robust se z p
## treatrIFN-g -1.11911 0.32657 0.26130 0.31018 -3.608 0.000309
## sexfemale -0.08275 0.92058 0.33098 0.36614 -0.226 0.821189
## age -0.03002 0.97043 0.01330 0.01415 -2.122 0.033876
##
## Likelihood ratio test=25.96 on 3 df, p=9.71e-06
## n= 203, number of events= 76
# dynamic model
cgd$event_past60 <- unlist(by(cgd, cgd$id, function(dat) {
c(0, ((dat$tstop[-1] - dat$tstop[-length(dat$tstop)]) < 60 &
dat$status[-length(dat$tstop)] == 1))
}) )
coxph(Surv(tstart, tstop, status) ~ treat + sex + age + event_past60,
data = cgd,
id = id)
## Call:
## coxph(formula = Surv(tstart, tstop, status) ~ treat + sex + age +
## event_past60, data = cgd, id = id)
##
## coef exp(coef) se(coef) robust se z p
## treatrIFN-g -0.88243 0.41378 0.26760 0.28591 -3.086 0.00203
## sexfemale -0.01571 0.98441 0.33719 0.35396 -0.044 0.96460
## age -0.02252 0.97773 0.01336 0.01154 -1.951 0.05109
## event_past60 1.99321 7.33908 0.31685 0.25537 7.805 5.94e-15
##
## Likelihood ratio test=56.41 on 4 df, p=1.648e-11
## n= 203, number of events= 76
Let \(T_i\) denote the time to event, \(\delta_i \in \{1, \ldots, d\}\) denote the indicator of the cause of the event for \(d\) competing causes, and \(X_i\) a vector of covariates for subject \(i = 1, \ldots, n\).
For a fixed \(t\), and for \(V_i = 1\{T_i < t, \delta_i=k\}\), We would like to fit the model
\[ g\{E(V_i | X_i)\} = g\{P(T_i < t, \delta_i = k | X_i)\} = \beta^\top X_i, \] for some link function \(g\).
We do not observe \(T_i\) and \(\delta_i\) directly, but rather \(Y_i = \min\{C_i, T_i\}\) where \(C_i\) is the censoring time, and \(\Delta_i \in \{0, 1, \ldots, d\}\) where where 0 indicates censoring occurred before any of the events.
We will use \(V_i\) to denote transformations of \((T_i, \delta_i)\) whose expectations represent summary statistics of interest.
Our main interest is in estimating the parameters of a generalized linear regression model for \(V_i\) conditional on covariates \(X_i\): \[ E(V_i | X_i) = g^{-1}\{X_i^{\top} \beta\} \]
for some specified link function \(g\).
Or, \[ g\{P(T_i < t_l| X_i = x_i)\} = (\beta_0 + \beta_l) + \beta_1x_i, l = 1, \ldots, k, \] or, \[ g\{P(T_i < t_l| X_i = x_i)\} = (\beta_0 + \beta_l) + (\beta_1 + \gamma_l) x_i, l = 1, \ldots, k. \]
If the pseudo-observations denoted by \(P_i\) satisfy
\[E(P_i | X_i) = E(V_i | X_i) + o_p(1)\]
in large samples, then solving the estimating equations
\[\sum_{i = 1}^n \frac{\partial g^{-1}}{\partial \beta} A_i^{-1} \{P_i - g^{-1}(X_i^{\top} \beta)\} = \sum_{i = 1}^n U_i(\beta) = 0,\]
for some specified variance parameter \(A_i\) yields consistent estimates of \(\beta\) in our model.
Then the variance of \(\hat{\beta}\) can be estimated using the sandwich variance estimator, or a more complicated one due to Overgaard (2017).
In general,
\[P_i = n \hat{\theta} - (n - 1) \hat{\theta}_{-i} = \hat{\theta} + (n-1)(\hat{\theta} - \hat{\theta}_{-i}).\]
where \(\hat{\theta}\) is an estimator of \(E(V_i)\) using all \(n\) observations, and \(\hat{\theta}_{-i}\) is an estimator of the same thing leaving the \(i\) th subject out.
Just as we previously viewed the nonparametric estimators as a functional of the empirical distribution function, Overgaard (2017) views the estimation procedure as a series of mappings, \[ \phi(F_n) \mapsto (P_1, \ldots, P_n) \mapsto \frac{1}{n}\sum U_i(\beta) \mapsto \hat{\beta}. \]
Each of these are differentiable, surprisingly, even the jackknife mapping, but it is quite a chore to work out the influence function and hence variance.
It involves computing complicated derivatives and covariances, even Overgaard (2017) made an error in that paper that was subsequently corrected.
Still, the justification applies to many nonparametric estimands of interest, including the survival curve evaluated at a finite set of points, the restricted mean survival, cumulative incidence in the competing risks model, and the mean of the counting process for recurrent events.
##
## Attaching package: 'eventglm'
## The following objects are masked from 'package:survival':
##
## colon, mgus2
colon <- eventglm::colon
colon.cifit <- cumincglm(Surv(time, status) ~ rx,
time = 2500, data = colon)
# P(T > t) instead of <
cumincglm(Surv(time, status) ~ rx,
time = 2500, survival = TRUE,
data = colon)
##
## Call: cumincglm(formula = Surv(time, status) ~ rx, time = 2500, data = colon,
## survival = TRUE)
##
##
## Model for the identity survival at time 2500
##
## Coefficients:
## (Intercept) rxLev rxLev+5FU
## 0.45655 0.02907 0.13176
##
## Degrees of Freedom: 928 Total (i.e. Null); 926 Residual
# any link supported by quasi
cumincglm(Surv(time, status) ~ rx,
time = 2500, link = "log",
data = colon)
##
## Call: cumincglm(formula = Surv(time, status) ~ rx, time = 2500, link = "log",
## data = colon)
##
##
## Model for the log cumulative incidence at time 2500
##
## Coefficients:
## (Intercept) rxLev rxLev+5FU
## -0.60982 -0.05498 -0.27766
##
## Degrees of Freedom: 928 Total (i.e. Null); 926 Residual
# restricted mean survival
colon.rmfit <- rmeanglm(Surv(time, status) ~ rx,
time = 2500, data = colon)
summary(colon.cifit)
##
## Call:
## cumincglm(formula = Surv(time, status) ~ rx, time = 2500, data = colon)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.5875 -0.4902 -0.3467 0.4863 2.1103
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.54345 0.02946 18.449 < 2e-16 ***
## rxLev -0.02907 0.04173 -0.697 0.48596
## rxLev+5FU -0.13176 0.04186 -3.148 0.00165 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasi family taken to be 1)
##
## Null deviance: 253.10 on 928 degrees of freedom
## Residual deviance: 250.15 on 926 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 2
data(mgus2)
# specify the cause of interest
mgfitci <- cumincglm(Surv(etime, event) ~ sex,
cause = "pcm", time = 120, #<<
data = mgus2)
# restricted mean lifetime lost due to cause
mgfitrm <- rmeanglm(Surv(etime, event) ~ sex,
cause = "pcm", time = 120, #<<
data = mgus2)
summary(mgfitci)
##
## Call:
## cumincglm(formula = Surv(etime, event) ~ sex, time = 120, cause = "pcm",
## data = mgus2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.08163 -0.07447 -0.06306 -0.05536 1.23923
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.07383 0.01086 6.800 1.04e-11 ***
## sexM -0.01857 0.01384 -1.342 0.18
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasi family taken to be 1)
##
## Null deviance: 88.507 on 1383 degrees of freedom
## Residual deviance: 88.388 on 1382 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 2
Consider a finite set of time points \(t_1, \ldots, t_k\), and the models of the form
\[g\{P(T_i < t_b| X_i = x_i)\} = (\beta_0 + \beta_b) + \beta_1x_i, b = 1, \ldots, k\]
These models are estimated using GEE with working independence
##
## Call:
## cumincglm(formula = Surv(time, status) ~ rx, time = c(500, 1000,
## 1500, 2000, 2500), data = colon)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.5667 -0.3947 -0.1709 0.5190 2.0931
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.170873 0.021793 7.841 4.48e-15 ***
## factor(pseudo.time)1000 0.170283 0.012340 13.800 < 2e-16 ***
## factor(pseudo.time)1500 0.263039 0.014456 18.196 < 2e-16 ***
## factor(pseudo.time)2000 0.314852 0.015290 20.592 < 2e-16 ***
## factor(pseudo.time)2500 0.351750 0.016423 21.419 < 2e-16 ***
## rxLev -0.003942 0.032530 -0.121 0.90355
## rxLev+5FU -0.093737 0.031979 -2.931 0.00338 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasi family taken to be 1)
##
## Null deviance: 1091.5 on 4644 degrees of freedom
## Residual deviance: 1009.4 on 4638 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 2
\[g\{P(T_i < t_b| X_i = x_i)\} = (\beta_0 + \beta_b) + \gamma_b x_i, b = 1, \ldots, k\]
# can mix and match time varying and time constant effects
mvtfit2 <- cumincglm(Surv(time, status) ~ tve(rx),
time = c(500, 1000, 1500, 2000, 2500),
data = colon)
##
## Call: cumincglm(formula = Surv(time, status) ~ tve(rx), time = c(500,
## 1000, 1500, 2000, 2500), data = colon)
##
##
## Model for the identity cumulative incidence at time 500 1000 1500 2000 2500
##
## Coefficients:
## (Intercept) factor(pseudo.time)1000
## 0.149290 0.178401
## factor(pseudo.time)1500 factor(pseudo.time)2000
## 0.289898 0.345379
## factor(pseudo.time)2500 factor(pseudo.time)500:rxLev
## 0.394161 -0.004138
## factor(pseudo.time)1000:rxLev factor(pseudo.time)1500:rxLev
## 0.017500 0.002754
## factor(pseudo.time)2000:rxLev factor(pseudo.time)2500:rxLev
## -0.006751 -0.029075
## factor(pseudo.time)500:rxLev+5FU factor(pseudo.time)1000:rxLev+5FU
## -0.027581 -0.074455
## factor(pseudo.time)1500:rxLev+5FU factor(pseudo.time)2000:rxLev+5FU
## -0.116689 -0.118203
## factor(pseudo.time)2500:rxLev+5FU
## -0.131758
##
## Degrees of Freedom: 4644 Total (i.e. Null); 4630 Residual
cumincglm(Surv(time, status) ~ rx + age, time = 2500,
model.censoring = "independent", data = colon)
cumincglm(Surv(time, status) ~ rx + age, time = 2500,
model.censoring = "stratified",
formula.censoring = ~ rx, data = colon)
The above estimated using the AJ estimator. Restricted mean based on sums of the jackknifed AJ.
cumincglm(Surv(time, status) ~ rx + age, time = 2500,
model.censoring = "coxph",
formula.censoring = ~ rx + age, data = colon)
cumincglm(Surv(time, status) ~ rx + age, time = 2500,
model.censoring = "aareg",
formula.censoring = ~ rx + age + age^2, data = colon)
With \(I_i = 1\{C_i \geq \min(T_i, t)\}\) and \(\tilde{V}_i = 1\{T_i < t\}\) if \(T_i < C_i\) and 0 otherwise, these are estimated using
\[\hat{\theta}^b = n^{-1}\sum_{i = 1}^n \frac{\tilde{V}_iI_i}{\hat{G}\{\min(\tilde{T}_i, t^*); \tilde{X}_i\}} \mbox{ or } \hat{\theta}^h = \frac{\sum_{i = 1}^n \tilde{V}_iw_i}{\sum_{i = 1}^n w_i}\]
\(\mbox{ where } w_i = \frac{I_i}{\hat{G}\{\min(\tilde{T}_i, t); \tilde{X}_i\}},\) with ipcw.method = "binder"
(the default) or ipcw.method = "hajek"
, respectively.
Let \(T^a\) denote the potential outcome under intervention to some exposure \(a \in \{0, 1\}\). Let \(A\) be the random variable indicating observation of the exposure. We assume that \(T^a = T\) if \(A = a\) and \(T^a\) and \(A\) are independent.
In an ideal randomized experiment where the exposure to \(A\) is determined by a coin flip, \[ P(T > t | A = a) = P(T^a > t | A = a) = P(T^a> t). \] We say there is a population causal effect of \(A\) on \(T\) if the distributions \(P(T^1 > t)\) and \(P(T^0 > t)\) differ.
Clearly, a comparison of survival curves, cumulative incidence curves, restricted mean, or almost any linear transformation and contrast of the quantities \(P(T^1 > t)\) and \(P(T^0 > t)\) has a clear causal interpretation.
Can we measure the causal effect using a hazard ratio?
The hazard ratio we are estimating is \[ \frac{\alpha(t, A = 1)}{\alpha(t, A = 0)} = \lim_{h\rightarrow 0}\frac{P(t \leq T^1 < t+ h | T^1 \geq t)/h}{P(t \leq T^0 < t + h | T^0 \geq t)/h}, \] because for \(u > t\), \(P(T > u | A = a, T \geq t) = P(T^a > u | T^a \geq t)\).
The two groups defined by \(T^1 \geq t\) and \(T^0 \geq t\) are not comparable except in the special case where there is no treatment effect.
If there is a treatment effect, the groups become more incomparable over time due to selection.
Refs:
Martinussen (2022) Causality and the Cox Regression Model
Aalen et al. (2015) Does Cox analysis of a randomized survival study yield a causal treatment effect?
Hernán MA (2010) The hazards of hazard ratios.
What if we knew and measures all prognostic covariates \(Z\)? Even then, the quantities \[ P(t \leq T^1 < t + h | T^1 \geq t, Z) \mbox{ and } P(t \leq T^0 < t + h | T^0 \geq t, Z) \] are not comparable unless \(T^1\) and \(T^0\) are conditionally independent given \(Z\) (an untestable assumption).
Using hazard differences also does not help, as the same selection problem exists, but on a different scale.
Martinussen and others recommend using contrasts in the survival function, for example the difference \[ P(T^1 > t) - P(T^0 > t) = P(T > t | A = 1) - P(T > t | A = 0). \] This can be estimated and inference obtained in a variety of ways, for example using the pseudo-observation approach with an identity link, or more efficiently using semiparametric efficient models, or even by fitting a Cox, or parametric model with covariates and standardizing.
See the stdReg
R package for details.