Michael Sachs
Research interest is on the length of time elapsing between two events.
For example, time from birth to death, surgery to recovery, entry into the study to first diagnosis of cancer.
The analysis of the time elapsing between two events is often referred to as survival analysis, failure time analysis, time to event analysis, or event history analysis.
Feature of this outcome variable is that its observations may be censored (more on this later).
It applies to experimental and observational data.
Let \(T\) denote a random variable representing time to death. \(T > 0\). Let’s pick a sample of n=10 survival times expressed in days
## t = 2 4 14 21 24 27 33 51 60 72
What is the probability of dying within a certain time \(t\)?
The two most cited statistical papers are:
with about 45,000 citations. It is a 1958 publication by US statisticians Edward Kaplan and Paul Meier that helps researchers to find survival patterns for a population, such as participants in clinical trials. That introduced what is now known as the Kaplan–Meier (KM) estimate.
with about 40,000 citations. It was the British statistician David Cox’s 1972 paper that expanded these survival analyses to regression.
Cox died in 2022 at age 97. If you haven’t already, I highly recommend reading this 1972 paper. Also his interview with Nancy Reid is an interesting account of his life and work (both posted on course webpage).
Kaplan-Meier curves and Cox regression models are the statistical methods commonly used to analyze survival data.
Much of statistics is about studying estimation error, i.e., \[ \hat{\theta} - \theta. \]
We will study estimation error of functions and processes relevant for the analysis of time-to-event data, i.e., \[ \hat{F}(t) - F(t), t \geq 0. \] or \[ \int_0^s H \,d\hat{F}_n - \int_0^s H\, dF \] Martingales are mathematical tools for studying such things and to use those tools, we need to view our estimators from that point of view.
\(\{N(t): t \geq 0\}\), \(N(t) \in \{0, 1, \ldots\}\) counts the number of events that occurred up to \(t\).
A process \(X\) is a family of random variables \(\{X(t): t \in \mathcal{T}\}\) indexed by the set \(\mathcal{T}\) all defined on the same probability space. \(\mathcal{T}\) will always represent time, and hence will be the positive real line \(\mathbb{R}^+\), or the positive integers.
Unlike random variables which map from the sample space \(\Omega\) to the real line, stochastic processes are mappings from \(\mathcal{T} \times \Omega\) to the real line. For a fixed \(\omega \in \Omega\), \(X(\cdot, \omega)\) is a random function called a sample path or trajectory.
This added complexity requires some new notions and redefinitions for things including integration, conditional expectation that we will cover now and introduce as needed later on.
Note that we will almost always suppress the dependence of \(X\) on values in \(\Omega\).
It is often of interest to look at expectations of the future of a process conditional on the past. In discrete time, we can write \[ E(X(t + 1) | X(0) = x_0, \ldots, X(t) = x_t). \] In continuous time, it may be tempting to write something similar. However, the sample paths are not exactly “events” that we can condition on, and hence we introduce the notion of a history of a process.
For a process \(X(t)\), let \(\mathcal{F}_t\) denote the smallest \(\sigma\)-algebra that makes all of the \(X(s), 0 \leq s \leq t\) measurable (i.e., random variables). The sequence of \(\sigma\)-algebras \(\{\mathcal{F}_t: t \geq 0\}\) such that \(\mathcal{F}_s \subset \mathcal{F}_t\) for \(s \leq t\) is the history of a process \(X(t)\) up to \(t\)
The \(\sigma\)-algebra is the formalization of the set of all possible events in that can be described by the process \(X(t)\). We may also allow this history to include auxiliary information, like covariates.
Recall: a \(\sigma\)-algebra is a set that is closed under complement and countable unions.
\(\mathcal{F}_n\) represents the information contained in \(X(t): t \leq n\) and we will write \(\mathcal{F}_{n-}\) to represent the information contained in \(X(t): t < n\) (up to just before \(n\)).
Note that \(\mathcal{F}_1 \subset \mathcal{F}_2 \subset \cdots \subset \mathcal{F}_n\), i.e., the information increases over time (why?). The family of nested sub-\(\sigma\) algebras \(\{\mathcal{F}_t\}\) is called a filtration.
We will write \(X(t) \in \mathcal{F}_t\) if \(X(t)\) is measurable with respect to \(\mathcal{F}_t\) (i.e., \(X(t)\) is a random function), and \(X\) is adapted to the filtration \(\{\mathcal{F}_t\}\) if \(X(t) \in \mathcal{F}_t\) for all \(t\).
Let \(X(t)\) be a process with history \(\mathcal{F}_t\) and \(T\) be a positive integer valued random variable.
\(T\) is a stopping time if for all \(t\) the event \(\{T \leq t\} \in \mathcal{F}_t\). The stopped process is \[ X^T = X(t \wedge T) = X(t) \mbox{ if } t \leq T \mbox{ and } X_T \mbox{ if } t > T. \]
If \(T\) is the time that an event occurs, then it is a stopping time if the information in \(\mathcal{F}_t\) specifies whether the event has occurred by time \(t\).
If \(N(t)\) is a counting process and \(f\) is some function of time (possibly random), then \(\int_s^t f(u) \, dN(u)\) or just \(\int_s^t f \, dN\) is the sum of the values of \(f\) evaluated at the jump times of \(N\) over the interval \((s, t]\).
In general, if \(Y\) is a right continuous process and \(X\) a real valued function then the properties of \(\int_s^t X \, dY\) follow from properties of integrals with respect to nondecreasing functions.
In particular \(\int_s^tX \, dY\) will a random variable, and hence \(\{\int_s^tX \, dY: s \leq t < \infty\}\) is a stochastic process with index \([s, \infty)\).
Usually \(s = 0\) and we will denote \(\int X\, dY\) to denote the process whose value at time \(t\) is \(\int_0^tX\, dY\).
The conditional expectation \(E(Y | \mathcal{F}_s)\) is the expectation of \(Y\) given the history up to \(s\) and is defined as follows: suppose \(Y\) is a stochastic process with probability measure \(P\) and history \(\mathcal{F}_t\). Let \(s < t\) so that \(\mathcal{F}_s \subset \mathcal{F}_t\). If \(X \in \mathcal{F}_s\) and \[ \int_B Y \, dP = \int_B X \, dP \] for all sets \(B \in \mathcal{F}_s\), then \(X\) is the conditional expectation of \(Y\) given \(\mathcal{F}_s\) and is denoted as above. It has the following useful properties
Let \(N(t)\) denote the counting process with history \(\mathcal{F}_t\). The intensity process is defined as \[ \lambda(t) = \frac{P(dN(t) = 1 | \mathcal{F}_{t-})}{dt}, \] and represents the rate of events at time \(t\) given the past. Given the definition of the hazard, we can write \(\lambda(t) = \alpha(t) (1 - N(t-))\) or \(= \alpha(t) I\{T \geq t\}\) if \(T\) is the event time.
Define \(Y(t) = (1 - N(t-))\) to be the at-risk process, the indicator that a failure has not yet occurred by time \(t\).
The cumulative intensity process is \(\Lambda(t) = \int_0^t\lambda(s) \, ds\).
Note: Fleming and Harrington use \(\lambda\) and \(\Lambda\) to denote the hazard/cumulative hazard. We are following the notation of ABG.
Suppose \(T_1, \ldots, T_n\) are independent non-negative random variables representing failure times (no censoring).
The aggregated counting process \(N(t) = \sum_{i = 1}^n N_i(t)\) counts the number of events that occur by time \(t\), and likewise for \(Y(t)\).
Assume there are no ties. If \(\alpha_i(t) = \alpha(t)\) then we have \(N(t)\) is a counting process with intensity \[ \lambda(t) = \alpha(t) Y(t), \] where \(Y(t) = \sum_{i = 1}^n Y_i(t)\). This is called the multiplicative intensity process.
The empirical survivor function is \(\hat{S}(t) = n^{-1} (n - N(t))\).
Let \(T_1, \ldots, T_n\) be potential lifetimes and for each \(i\) we observe either \(T_i\) or \(C_i < T_i\), where \(C_i\) is the censoring time. The observed data are \((\tilde{T}_i, D_i)\), where \(D_i = 1\) if \(T_i = \tilde{T}_i\) and \(D_i = 0\) if \(\tilde{T}_i = C_i\), in which case we know that \(\tilde{T}_i < T_i\).
Recall that the goal is to infer something about \(P(T_i > t | \mathcal{F}_t)\) and related quantities. Under what conditions can this be done based on the censored process \(N(t)\)?
We will see next time that the following condition leads to some nice properties of our estimators: \[ P(t \leq \tilde{T}_i < t + dt, D_i = 1 | \tilde{T}_i \geq t, \mathcal{F}_{t-}) = P(t \leq T_i < t + dt | T_i \geq t). \]
The right hand side is \(\alpha_i(t) dt\) (hazard of the true times). The intensity of the observed process is \[ P(dN_i(t) = 1 | \mathcal{F}_{t-}) = \lambda_i(t) dt \] and under the above condition we have \[ \lambda_i(t) dt = \alpha_i(t) dt Y_i(t). \] The condition is called “independent censoring”, even though it is possible to construct event times and censoring times that are dependent, but where the condition holds.
If subjects enter the observation period at a certain time, there is left truncation. Then the observation for the \(i\) subject is \((V_i, \tilde{T}_i, D_i)\) where \(V_i\) is the entry time.
With the at-risk process defined as \(Y_i(t) = I\{V_i < t \leq \tilde{T}_i\}\) we have independent left truncation if the intensity process is as above.
Under the independent censoring assumption, the process
\[ M(t) = N(t) - \int_0^t\lambda(t) \, dt \]
is a mean zero martingale. A martingale is a process that has a random component and a predictable component (analogous to the estimator and estimand/signal plus noise).
Martingale theory, which was developed independently of survival analysis, gives us some tools to study the statistical properties of these estimators, with some slightly different notions of consistency and asymptotic normality.
We will review these ideas and connect them to the estimators of interest in survival analysis, and see how the general theory applies.