Michael Sachs
Let \(T_1, \ldots, T_n\) be potential lifetimes and for each \(i\) we observe either \(T_i\) or \(C_i < T_i\), where \(C_i\) is the censoring time. The observed data are \((\tilde{T}_i, D_i)\), where \(D_i = 1\) if \(T_i = \tilde{T}_i\) and \(D_i = 0\) if \(\tilde{T}_i = C_i\), in which case we know that \(\tilde{T}_i < T_i\).
Type I: for a prespecified \(c_i\), \(T_i\) is observed only if \(T_i < c_i\), for \(i = 1, \ldots, n\).
Type II: Observation continues until time \(T\) at which \(k\) events have occurred. Then \(C_i = T\) for all \(i\).
Random censoring: Censoring at random times \(C_i\) that are independent of \(T_i\).
Often in epidemiology it is of interest to study mortality as a function of time since onset of some disease. For example, in a register that started data collection on particular date \(V\), we want to study mortality among people with diabetes. The relevant distribution to study is \(T_i | T_i > V\). The entry time could also depend on \(i\), e.g., if the time scale is age.
In this case the at risk process is: \(Y_i(t) = I\{V_i < t \leq \tilde{T}_i\}\) and the counting process is \[ N(t) - N(t \wedge V_i) \]
Left-censoring is unusual in epi, but here is an example from ABGK:
Troops of baboons sleep in trees in the Amboseli Reserve Kenya, and descend during the day to forage. It is of interest to study the time to descent from the trees and observers arrive at some time \(V_i\) during the day. If, when they arrive they observe a baboon on the ground they can only ascertain that the descent took place before \(V_i\).
Exercise: write down the at risk process and the counting process in this example.
Is a combination of left and right censoring. It corresponds to observing \(N_i\) on a set of the form \(E_i = (V_i, U_i]\), or more generally, to a set \[ E_i = \cup_{j = 1}^r (V_{ji}, U_{ji}] \] where \(0 \leq V_{1i} \leq U_{1i} \leq \cdots V_{ri} \leq U_{ri} \leq t\).
This often happens if the event of interest depends on obtaining some measurement, e.g., time to diabetic nephropathy, which requires a measurement of kidney function.
ABG Figure 2.1 shows the three hypothetical models:
Under what conditions can we make valid inference about \(N^C(t)\) based on \(\mathcal{F}_t\)?
Answers to this question lead to different definitions and notions of independent censoring and a related topic is noninformative censoring.
It suffices to study the intensities of the counting process. We define a general censoring process \(C_i(t) = I(t \in E_i)\) for arbitrary subintervals \(E_i\). The censored or filtered counting process is \[ N_i(t) = \int_0^t C_i(u) \, dN_i^C(u). \]
Consider a counting process \(N^C(t)\) adapted to two filtrations \(\mathcal{F}^C_t \subseteq \mathcal{G}_t\) for all \(t\), i.e., the filtrations are nested. Suppose \(N^C(t)\) has intensity \(\lambda(t)\) with respect to \(\mathcal{G}_t\). The intensity with respect to \(\mathcal{F}^C_t\) will generally differ, but the innovation theorem says \[ \tilde{\lambda}(t) = E(\lambda(t) | \mathcal{F}^C_{t-}) \] is the \(\mathcal{F}^C_t\) intensity of \(N(t)\). If \(N\) is adapted to \(\mathcal{F}^C\) and \(\lambda(t)\) is predictable with respect to \(\mathcal{F}^C_t\) then \(\tilde{\lambda}(t) = \lambda(t)\).
What does this have to do with censoring?
Consider the joint model with history \(\mathcal{G}_t\) containing information about the processes \(N^C\) and \(C\). The innovation theorem tells us when \[ \lambda^\mathcal{G}(t) = \lambda^{\mathcal{F}^C}(t), \] which is called independent censoring.
\[ \lambda^\mathcal{G}(t) = \lambda^{\mathcal{F}^C}(t), \] if \(N\) is adapted to \(\mathcal{F}^C\) and \(\lambda(t)\) is predictable with respect to \(\mathcal{F}^C_t\).
ABG assume the above equation, but it is also possible to frame it in other ways or to use the innovation theorem condition directly on a case-by-case basis.
Suppose there is a fixed constant \(d\), such that \(T_i\) is observed only if \(T_i < d\), for \(i = 1, \ldots, n\).
In the simple type I censoring example, the observed counting process is \(N^d = \int I_{[0,d]}\, dN\), counting events up to and including time \(d\). Because \(d\) is a stopping time, \(N^d\) has cumulative intensity process \(\int I_{[0,d]}\, d\Lambda\) with respect to \(\mathcal{F}^C_t\) where \(\Lambda\) is the cumulative intensity of \(N\). Now considering the reduced/observed history \(\mathcal{F}_{t \wedge d}\), note that \(N^d\) is adapted to this filtration, and hence the cumulative intensity is the same.
In general, the independent censoring assumption preserves the form of the intensity process comparing the observed history to the complete history. See page 60 of ABG.
Moving forward, we will assume that
\[ \lambda_i^{\mathcal{G}}(t) = \lambda_i^{\mathcal{F}^C}(t), \] regardless of the form of the joint filtration \(\mathcal{G}\). I.e, it applies for right, interval, left censoring, and left truncation.
You may have also heard the term noninformative censoring. For this we need to introduce the likelihood (which will come up again later).
Suppose \(N^C\) has cumulative intensity \(\Lambda^\theta\) with respect to a probability measure \(P_{\theta, \phi}\), where \(\theta\) and \(\phi\) may be infinite dimensional (i.e., this is not necessarily a parametric model). \(\theta\) is of primary interest, and \(\phi\) the nuisance parameter. The \(\mathcal{F}^C_t\) likelihood may be factored (under independent censoring) \[ L(\theta, \phi) = L^C(\theta) L'(\theta, \phi), \] where the first term corresponds to the components relevant to the event of interest, and the second term those relevant to censoring, covariates, etc. The first term is called the partial likelihood for \(\theta\). If \(L'(\theta, \phi) = L'(\phi)\), then the censoring is noninformative, or more specifically uninformative for \(\theta\).
Note that inference based on \(L^C(\theta)\) alone is asymptotically valid (we shall see later), i.e., consistent and asymptotically normal, but potentially inefficient if there is informative censoring.
Fleming and Harrington formulate independent censoring as follows.
Suppose \(T\) is an absolutely continuous failure time with hazard function \(\alpha(t)\), and \(U\) a right censoring time. Let \(X = T \wedge U\), \(\delta = I(T \leq U)\) and define \(N(t) = I(X \leq t, \delta = 1)\). Then \[ M(t) = N(t) - \int_0^t I(X \geq u) \alpha(u) \, du \] is a martingale if and only if \(\alpha(t) = \alpha^*(t)\), where \[ \alpha^*(t) = \lim_{h \rightarrow 0} \frac{1}{h}P(t \leq T < t + h | T \geq t, U \geq t). \]
How would you incorporate covariates in our formulations here?