Michael Sachs
Once you have an estimate of S(t)=P(T>t)=1−F(t), there are some other interesting summary measures of the distribution of the time to event
Let ξp denote the value that satisfies F(ξp)=p for p∈[0,1]. We can define ˆξp to be the smallest value x for which ˆS(x)≤1−p.
The asymptotic distribution of ˆξp has a variance that depends on the density of T, which can be shown using the functional delta method for ϕ(F)=F−1.
Instead, confidence limits for ξp can be determined by applying the same functional to the confidence limits of the survival curve:
## inst time status age sex ph.ecog ph.karno pat.karno meal.cal wt.loss
## 1 3 306 2 74 1 1 90 100 1175 NA
## 2 3 455 2 68 1 0 90 90 1225 15
## 3 3 1010 1 56 1 0 90 90 NA 15
## 4 5 210 2 57 1 1 90 60 1150 11
## 5 1 883 2 60 1 0 100 90 NA 0
## 6 12 1022 1 74 1 1 50 80 513 0
sfit <- survfit(Surv(time, status == 2) ~ 1,
data = lung)
plot(sfit)
xi.5 <- sfit$time[min(which(sfit$surv < .5))]
xi.5.low <- sfit$time[min(which(sfit$lower < .5))]
xi.5.hi <- sfit$time[min(which(sfit$upper < .5))]
segments(c(0, xi.5.low, xi.5, xi.5.hi),
c(.5, .5, .5, .5),
c(xi.5.hi, xi.5.low, xi.5, xi.5.hi),
c(0.5, 0, 0, 0), lty = 3)
## quantile.50 lower.50 upper.50
## 310 285 363
## [1] "310 (285 to 363)"
The mean survival time is μ=∫∞0S(t)dt, but the value depends on the right-hand tail of the distribution. With censoring, we don’t have a good estimate of that right hand tail beyond the last observed event time.
Instead, the restricted mean is used: for a fixed t μt=∫t0S(u)du and ˆμt is estimated by plugging in ˆS, i.e., the area under the survival curve.
Asymptotically, the restricted mean is approximately normal with variance that may be estimated by ∑Tj≤t(ˆμt−ˆμTj)2Y(Tj)2.
Consider the competing risks setting, with Ti the time to the first event, and δi the event indicator.
If we denote Fj(t)=P(Ti<t,δi=j), and S(t) the overall survival function, then due to the relationship S(t)+∑jFj(t)=1, we have t∗−∫t∗0S(t)dt=∑j∫t∗0Fj(t)dt. Hence for t∗ a fixed time we have ∫t∗0P(Ti<u,δi=k)du=E(t∗−min This is interpreted as the expected lifetime lost due to cause k up to time t^*.
See Andersen PK. Decomposition of number of life years lost according to causes of death. Statistics in medicine. 2013 Dec 30;32(30):5278-85.
Suppose N_i(t) counts the number of observed events up to time t for subject/cluster i, where there may be more than one event. This is usually subject to censoring, so we have the at risk process Y_i(t) such that N_i(t) = \int Y_i(s) d\tilde{N}_i(s) where \tilde{N}_i is the fully observed counting process.
We have an intensity process E(dN_i(t) | \mathcal{F}_{t-}) = \lambda_i(t) dt such that N_i = \int \lambda_i(s) \, ds + M_i, M_i a martingale.
We can also have a rate model or marginal model r_i(t) dt = E(d\tilde{N}_i(t) | \mathcal{E}_{t-}), where \mathcal{E}_{t-} contains information on external covariates, but not internal covariates nor the history of the counting process (See ABG page 132).
When conditioning on \mathcal{E}_{t-}, we do not get the Doob-Meyer decomposition, and to progress we must assume E(dN_i(t) | Y_i(t), \mathcal{E}_{t-}) = Y_i(t) E(d\tilde{N}_i(t) | \mathcal{E}_{t-}), which is stronger than the independent censoring assumption.
The Nelson-Aalen estimator for the cumulative hazard is defined \hat{A}(t) = \int_0^t \frac{dN(u)}{Y(s)}, but with recurrent events, we have different possible ways of aggregating the processes. N(t) will usually be the sum of the individual counting processes (i.e., total number of events), and if Y(t) indicates for all observable processes, then we have an estimate of the cumulative mean function.
See Ghosh D, Lin D (2000) Nonparametric analysis of recurrent events and death. Biometrics 56:554–562 for more details.
In ABG, example 8.2, they consider 27 individuals observed sleeping for one night. The outcome is the number of awakenings during the night. They consider three ways of defining the at-risk process:
They note that Y(t) =Y_1(t) + Y_2(t), and hence the three different cumulative mean functions can be estimated from the additive model for the rate:
r_i(t) = \beta_1(t)I_i(t) + \beta_2(t)(1 - I_i(t)), where I_i(t) is 1 if the number of awakenings is greater than the mean for all processes, and 0 otherwise.
We can assume a relative model Y_i(t) \alpha_0(t)r(\beta, Z_i(t)), where the key distinction is that the covariate processes Z_i(t) may or may not contain information about the history of the counting processes. If it does, then the model is semi-Markov, and if not, then the model is Markov.
Andersen and Gill (1982) showed that the Cox partial likelihood with recurrent events works, in the sense that maximizing it leads to consistent and asymptotically normal estimates of the coefficients.
In the ABG book, they distinguish between these two cases as marginal versus dynamic.
Additive regression models also work in the same way, but in the book the discuss the problem of dynamic covariates leading to singularities in the hat matrix.
## id center random treat sex age height weight inherit
## 1 1 Scripps Institute 1989-06-07 rIFN-g female 12 147 62.0 autosomal
## 2 1 Scripps Institute 1989-06-07 rIFN-g female 12 147 62.0 autosomal
## 3 1 Scripps Institute 1989-06-07 rIFN-g female 12 147 62.0 autosomal
## 4 2 Scripps Institute 1989-06-07 placebo male 15 159 47.5 autosomal
## 5 2 Scripps Institute 1989-06-07 placebo male 15 159 47.5 autosomal
## 6 2 Scripps Institute 1989-06-07 placebo male 15 159 47.5 autosomal
## steroids propylac hos.cat tstart enum tstop status
## 1 0 0 US:other 0 1 219 1
## 2 0 0 US:other 219 2 373 1
## 3 0 0 US:other 373 3 414 0
## 4 0 1 US:other 0 1 8 1
## 5 0 1 US:other 8 2 26 1
## 6 0 1 US:other 26 3 152 1
## nonparametric
cgdsurv <- survfit(Surv(tstart, tstop, status) ~ treat, cgd, id=id)
plot(cgdsurv, cumhaz=TRUE, col=1:2, conf.times=c(100, 200, 300, 400),
xlab="Days since randomization", ylab="Cumulative hazard")
## additive version
cdgsurv2 <- aareg(Surv(tstart, tstop, status) ~ treat, data = cgd)
plot(cumsum(cdgsurv2$coefficient[, 1]) ~ cdgsurv2$times, type = "s")
lines(cumsum(cdgsurv2$coefficient[, 1] + cdgsurv2$coefficient[, 2])~
cdgsurv2$times, type = "s", col = "red")
## regression
#marginal model
coxph(Surv(tstart, tstop, status) ~ treat + sex + age, data = cgd,
id = id)
## Call:
## coxph(formula = Surv(tstart, tstop, status) ~ treat + sex + age,
## data = cgd, id = id)
##
## coef exp(coef) se(coef) robust se z p
## treatrIFN-g -1.11911 0.32657 0.26130 0.31018 -3.608 0.000309
## sexfemale -0.08275 0.92058 0.33098 0.36614 -0.226 0.821189
## age -0.03002 0.97043 0.01330 0.01415 -2.122 0.033876
##
## Likelihood ratio test=25.96 on 3 df, p=9.71e-06
## n= 203, number of events= 76
# dynamic model
cgd$event_past60 <- unlist(by(cgd, cgd$id, function(dat) {
c(0, ((dat$tstop[-1] - dat$tstop[-length(dat$tstop)]) < 60 &
dat$status[-length(dat$tstop)] == 1))
}) )
coxph(Surv(tstart, tstop, status) ~ treat + sex + age + event_past60,
data = cgd,
id = id)
## Call:
## coxph(formula = Surv(tstart, tstop, status) ~ treat + sex + age +
## event_past60, data = cgd, id = id)
##
## coef exp(coef) se(coef) robust se z p
## treatrIFN-g -0.88243 0.41378 0.26760 0.28591 -3.086 0.00203
## sexfemale -0.01571 0.98441 0.33719 0.35396 -0.044 0.96460
## age -0.02252 0.97773 0.01336 0.01154 -1.951 0.05109
## event_past60 1.99321 7.33908 0.31685 0.25537 7.805 5.94e-15
##
## Likelihood ratio test=56.41 on 4 df, p=1.648e-11
## n= 203, number of events= 76
Let T_i denote the time to event, \delta_i \in \{1, \ldots, d\} denote the indicator of the cause of the event for d competing causes, and X_i a vector of covariates for subject i = 1, \ldots, n.
For a fixed t, and for V_i = 1\{T_i < t, \delta_i=k\}, We would like to fit the model
g\{E(V_i | X_i)\} = g\{P(T_i < t, \delta_i = k | X_i)\} = \beta^\top X_i, for some link function g.
We do not observe T_i and \delta_i directly, but rather Y_i = \min\{C_i, T_i\} where C_i is the censoring time, and \Delta_i \in \{0, 1, \ldots, d\} where where 0 indicates censoring occurred before any of the events.
We will use V_i to denote transformations of (T_i, \delta_i) whose expectations represent summary statistics of interest.
Our main interest is in estimating the parameters of a generalized linear regression model for V_i conditional on covariates X_i: E(V_i | X_i) = g^{-1}\{X_i^{\top} \beta\}
for some specified link function g.
Or, g\{P(T_i < t_l| X_i = x_i)\} = (\beta_0 + \beta_l) + \beta_1x_i, l = 1, \ldots, k, or, g\{P(T_i < t_l| X_i = x_i)\} = (\beta_0 + \beta_l) + (\beta_1 + \gamma_l) x_i, l = 1, \ldots, k.
If the pseudo-observations denoted by P_i satisfy
E(P_i | X_i) = E(V_i | X_i) + o_p(1)
in large samples, then solving the estimating equations
\sum_{i = 1}^n \frac{\partial g^{-1}}{\partial \beta} A_i^{-1} \{P_i - g^{-1}(X_i^{\top} \beta)\} = \sum_{i = 1}^n U_i(\beta) = 0,
for some specified variance parameter A_i yields consistent estimates of \beta in our model.
Then the variance of \hat{\beta} can be estimated using the sandwich variance estimator, or a more complicated one due to Overgaard (2017).
In general,
P_i = n \hat{\theta} - (n - 1) \hat{\theta}_{-i} = \hat{\theta} + (n-1)(\hat{\theta} - \hat{\theta}_{-i}).
where \hat{\theta} is an estimator of E(V_i) using all n observations, and \hat{\theta}_{-i} is an estimator of the same thing leaving the i th subject out.
Just as we previously viewed the nonparametric estimators as a functional of the empirical distribution function, Overgaard (2017) views the estimation procedure as a series of mappings, \phi(F_n) \mapsto (P_1, \ldots, P_n) \mapsto \frac{1}{n}\sum U_i(\beta) \mapsto \hat{\beta}.
Each of these are differentiable, surprisingly, even the jackknife mapping, but it is quite a chore to work out the influence function and hence variance.
It involves computing complicated derivatives and covariances, even Overgaard (2017) made an error in that paper that was subsequently corrected.
Still, the justification applies to many nonparametric estimands of interest, including the survival curve evaluated at a finite set of points, the restricted mean survival, cumulative incidence in the competing risks model, and the mean of the counting process for recurrent events.
##
## Attaching package: 'eventglm'
## The following objects are masked from 'package:survival':
##
## colon, mgus2
colon <- eventglm::colon
colon.cifit <- cumincglm(Surv(time, status) ~ rx,
time = 2500, data = colon)
# P(T > t) instead of <
cumincglm(Surv(time, status) ~ rx,
time = 2500, survival = TRUE,
data = colon)
##
## Call: cumincglm(formula = Surv(time, status) ~ rx, time = 2500, data = colon,
## survival = TRUE)
##
##
## Model for the identity survival at time 2500
##
## Coefficients:
## (Intercept) rxLev rxLev+5FU
## 0.45655 0.02907 0.13176
##
## Degrees of Freedom: 928 Total (i.e. Null); 926 Residual
# any link supported by quasi
cumincglm(Surv(time, status) ~ rx,
time = 2500, link = "log",
data = colon)
##
## Call: cumincglm(formula = Surv(time, status) ~ rx, time = 2500, link = "log",
## data = colon)
##
##
## Model for the log cumulative incidence at time 2500
##
## Coefficients:
## (Intercept) rxLev rxLev+5FU
## -0.60982 -0.05498 -0.27766
##
## Degrees of Freedom: 928 Total (i.e. Null); 926 Residual
# restricted mean survival
colon.rmfit <- rmeanglm(Surv(time, status) ~ rx,
time = 2500, data = colon)
summary(colon.cifit)
##
## Call:
## cumincglm(formula = Surv(time, status) ~ rx, time = 2500, data = colon)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.5875 -0.4902 -0.3467 0.4863 2.1103
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.54345 0.02946 18.449 < 2e-16 ***
## rxLev -0.02907 0.04173 -0.697 0.48596
## rxLev+5FU -0.13176 0.04186 -3.148 0.00165 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasi family taken to be 1)
##
## Null deviance: 253.10 on 928 degrees of freedom
## Residual deviance: 250.15 on 926 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 2
data(mgus2)
# specify the cause of interest
mgfitci <- cumincglm(Surv(etime, event) ~ sex,
cause = "pcm", time = 120, #<<
data = mgus2)
# restricted mean lifetime lost due to cause
mgfitrm <- rmeanglm(Surv(etime, event) ~ sex,
cause = "pcm", time = 120, #<<
data = mgus2)
summary(mgfitci)
##
## Call:
## cumincglm(formula = Surv(etime, event) ~ sex, time = 120, cause = "pcm",
## data = mgus2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.08163 -0.07447 -0.06306 -0.05536 1.23923
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.07383 0.01086 6.800 1.04e-11 ***
## sexM -0.01857 0.01384 -1.342 0.18
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasi family taken to be 1)
##
## Null deviance: 88.507 on 1383 degrees of freedom
## Residual deviance: 88.388 on 1382 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 2
Consider a finite set of time points t_1, \ldots, t_k, and the models of the form
g\{P(T_i < t_b| X_i = x_i)\} = (\beta_0 + \beta_b) + \beta_1x_i, b = 1, \ldots, k
These models are estimated using GEE with working independence
##
## Call:
## cumincglm(formula = Surv(time, status) ~ rx, time = c(500, 1000,
## 1500, 2000, 2500), data = colon)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.5667 -0.3947 -0.1709 0.5190 2.0931
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.170873 0.021793 7.841 4.48e-15 ***
## factor(pseudo.time)1000 0.170283 0.012340 13.800 < 2e-16 ***
## factor(pseudo.time)1500 0.263039 0.014456 18.196 < 2e-16 ***
## factor(pseudo.time)2000 0.314852 0.015290 20.592 < 2e-16 ***
## factor(pseudo.time)2500 0.351750 0.016423 21.419 < 2e-16 ***
## rxLev -0.003942 0.032530 -0.121 0.90355
## rxLev+5FU -0.093737 0.031979 -2.931 0.00338 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasi family taken to be 1)
##
## Null deviance: 1091.5 on 4644 degrees of freedom
## Residual deviance: 1009.4 on 4638 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 2
g\{P(T_i < t_b| X_i = x_i)\} = (\beta_0 + \beta_b) + \gamma_b x_i, b = 1, \ldots, k
# can mix and match time varying and time constant effects
mvtfit2 <- cumincglm(Surv(time, status) ~ tve(rx),
time = c(500, 1000, 1500, 2000, 2500),
data = colon)
##
## Call: cumincglm(formula = Surv(time, status) ~ tve(rx), time = c(500,
## 1000, 1500, 2000, 2500), data = colon)
##
##
## Model for the identity cumulative incidence at time 500 1000 1500 2000 2500
##
## Coefficients:
## (Intercept) factor(pseudo.time)1000
## 0.149290 0.178401
## factor(pseudo.time)1500 factor(pseudo.time)2000
## 0.289898 0.345379
## factor(pseudo.time)2500 factor(pseudo.time)500:rxLev
## 0.394161 -0.004138
## factor(pseudo.time)1000:rxLev factor(pseudo.time)1500:rxLev
## 0.017500 0.002754
## factor(pseudo.time)2000:rxLev factor(pseudo.time)2500:rxLev
## -0.006751 -0.029075
## factor(pseudo.time)500:rxLev+5FU factor(pseudo.time)1000:rxLev+5FU
## -0.027581 -0.074455
## factor(pseudo.time)1500:rxLev+5FU factor(pseudo.time)2000:rxLev+5FU
## -0.116689 -0.118203
## factor(pseudo.time)2500:rxLev+5FU
## -0.131758
##
## Degrees of Freedom: 4644 Total (i.e. Null); 4630 Residual
cumincglm(Surv(time, status) ~ rx + age, time = 2500,
model.censoring = "independent", data = colon)
cumincglm(Surv(time, status) ~ rx + age, time = 2500,
model.censoring = "stratified",
formula.censoring = ~ rx, data = colon)
The above estimated using the AJ estimator. Restricted mean based on sums of the jackknifed AJ.
cumincglm(Surv(time, status) ~ rx + age, time = 2500,
model.censoring = "coxph",
formula.censoring = ~ rx + age, data = colon)
cumincglm(Surv(time, status) ~ rx + age, time = 2500,
model.censoring = "aareg",
formula.censoring = ~ rx + age + age^2, data = colon)
With I_i = 1\{C_i \geq \min(T_i, t)\} and \tilde{V}_i = 1\{T_i < t\} if T_i < C_i and 0 otherwise, these are estimated using
\hat{\theta}^b = n^{-1}\sum_{i = 1}^n \frac{\tilde{V}_iI_i}{\hat{G}\{\min(\tilde{T}_i, t^*); \tilde{X}_i\}} \mbox{ or } \hat{\theta}^h = \frac{\sum_{i = 1}^n \tilde{V}_iw_i}{\sum_{i = 1}^n w_i}
\mbox{ where } w_i = \frac{I_i}{\hat{G}\{\min(\tilde{T}_i, t); \tilde{X}_i\}}, with ipcw.method = "binder"
(the default) or ipcw.method = "hajek"
, respectively.
Let T^a denote the potential outcome under intervention to some exposure a \in \{0, 1\}. Let A be the random variable indicating observation of the exposure. We assume that T^a = T if A = a and T^a and A are independent.
In an ideal randomized experiment where the exposure to A is determined by a coin flip, P(T > t | A = a) = P(T^a > t | A = a) = P(T^a> t). We say there is a population causal effect of A on T if the distributions P(T^1 > t) and P(T^0 > t) differ.
Clearly, a comparison of survival curves, cumulative incidence curves, restricted mean, or almost any linear transformation and contrast of the quantities P(T^1 > t) and P(T^0 > t) has a clear causal interpretation.
Can we measure the causal effect using a hazard ratio?
The hazard ratio we are estimating is \frac{\alpha(t, A = 1)}{\alpha(t, A = 0)} = \lim_{h\rightarrow 0}\frac{P(t \leq T^1 < t+ h | T^1 \geq t)/h}{P(t \leq T^0 < t + h | T^0 \geq t)/h}, because for u > t, P(T > u | A = a, T \geq t) = P(T^a > u | T^a \geq t).
The two groups defined by T^1 \geq t and T^0 \geq t are not comparable except in the special case where there is no treatment effect.
If there is a treatment effect, the groups become more incomparable over time due to selection.
Refs:
Martinussen (2022) Causality and the Cox Regression Model
Aalen et al. (2015) Does Cox analysis of a randomized survival study yield a causal treatment effect?
Hernán MA (2010) The hazards of hazard ratios.
What if we knew and measures all prognostic covariates Z? Even then, the quantities P(t \leq T^1 < t + h | T^1 \geq t, Z) \mbox{ and } P(t \leq T^0 < t + h | T^0 \geq t, Z) are not comparable unless T^1 and T^0 are conditionally independent given Z (an untestable assumption).
Using hazard differences also does not help, as the same selection problem exists, but on a different scale.
Martinussen and others recommend using contrasts in the survival function, for example the difference P(T^1 > t) - P(T^0 > t) = P(T > t | A = 1) - P(T > t | A = 0). This can be estimated and inference obtained in a variety of ways, for example using the pseudo-observation approach with an identity link, or more efficiently using semiparametric efficient models, or even by fitting a Cox, or parametric model with covariates and standardizing.
See the stdReg
R package for details.
Space, Right Arrow or swipe left to move to next slide, click help below for more details