Generalized linear models for cumulative incidence

Using pseudo observations for the cumulative incidence, this function then runs a generalized linear model and estimates the parameters representing contrasts in the cumulative incidence at a particular set of times (specified by the time argument) across covariate values. The link function can be "identity" for estimating differences in the cumulative incidence, "log" for estimating ratios, and any of the other link functions supported by quasi.

Usage

cumincglm(
  formula,
  time,
  cause = 1,
  link = "identity",
  model.censoring = "independent",
  formula.censoring = NULL,
  ipcw.method = "binder",
  data,
  survival = FALSE,
  weights,
  subset,
  na.action,
  offset,
  control = list(...),
  model = FALSE,
  x = TRUE,
  y = TRUE,
  singular.ok = TRUE,
  contrasts = NULL,
  id,
  ...
)

Arguments

formula: A formula specifying the model. The left hand side must be a Surv object specifying a right censored survival or competing risks outcome. The status indicator, normally 0=alive, 1=dead. Other choices are TRUE/FALSE (TRUE = death) or 1/2 (2=death). For competing risks, the event variable will be a factor, whose first level is treated as censoring. The right hand side is the usual linear combination of covariates. If there are multiple time points, the special term "tve(.)" can be used to specify that the effect of the variable inside the parentheses will be time varying. In the output this will be represented as the interaction between the time points and the variable.
time: Numeric vector specifying the times at which the cumulative incidence or survival probability effect estimates are desired.
cause: Numeric or character constant specifying the cause indicator of interest.
link: Link function for the cumulative incidence regression model.
model.censoring: Type of model for the censoring distribution. Options are "stratified", which computes the pseudo-observations stratified on a set of categorical covariates, "aareg" for Aalen's additive hazards model, and "coxph" for Cox's proportional hazards model. With those options, we assume that the time to event and event indicator are conditionally independent of the censoring time, and that the censoring model is correctly specified. If "independent", we assume completely independent censoring, i.e., that the time to event and covariates are independent of the censoring time. the censoring time is independent of the covariates in the model. Can also be a custom function, see Details and the "Extending eventglm" vignette.
formula.censoring: A one sided formula (e.g., ~ x1 + x2) specifying the model for the censoring distribution. If NULL, uses the same mean model as for the outcome. Missing values in any covariates for the censoring model will cause an error.
ipcw.method: Which method to use for calculation of inverse probability of censoring weighted pseudo observations. "binder" the default, uses the number of observations as the denominator, while the "hajek" method uses the sum of the weights as the denominator.
data: Data frame in which all variables of formula can be interpreted.
survival: Set to TRUE to use survival (one minus the cumulative incidence) as the outcome. Not available for competing risks models.
weights: an optional vector of 'prior weights' to be used in the fitting process. Should be NULL or a numeric vector found in data.
subset: an optional vector specifying a subset of observations to be used in the fitting process.
na.action: a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The 'factory-fresh' default is na.omit. Another possible value is NULL, no action. Value na.exclude can be useful.
offset: this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length equal to the number of cases. One or more offset terms can be included in the formula instead or as well, and if more than one is specified their sum is used. See model.offset. If length(time) > 1, then any offset terms must appear in the formula.
control: a list of parameters for controlling the fitting process. This is passed to glm.control.
model: a logical value indicating whether model frame should be included as a component of the returned value.
x: logical value indicating whether the model matrix used in the fitting process should be returned as components of the returned value.
y: logical value indicating whether the response vector (pseudo-observations) used in the fitting process should be returned as components of the returned value.
singular.ok: logical; if FALSE a singular fit is an error.
contrasts: an optional list. See the contrasts.arg of model.matrix.default.
id: An optional integer vector of subject identifiers, found in data. If this is present, then generalized estimating equations will be used to fit the model. This can be used, for example, if there are multiple observations per individual represented as multiple rows in data.
...: Other arguments passed to glm.fit

Value

A pseudoglm object, with its own methods for print, summary, and vcov. It inherits from glm, so predict and other glm methods are supported.

Details

The argument "model.censoring" determines how the pseudo observations are calculated. This can be the name of a function or the function itself, which must have arguments "formula", "time", "cause", "data", "type", "formula.censoring", and "ipcw.method". If it is the name of a function, this code will look for a function with the prefix "pseudo_" first, to avoid clashes with related methods such as coxph. The function then must return a vector of pseudo observations, one for each subject in data which are used in subsequent calculations. For examples of the implementation, see the "pseudo-modules.R" file, or the vignette "Extending eventglm".

Examples

    cumincipcw <- cumincglm(Surv(etime, event) ~ age + sex,
         time = 200, cause = "pcm", link = "identity",
         model.censoring = "independent", data = mgus2)
# stratified on only the categorical covariate
     cumincipcw2 <- cumincglm(Surv(etime, event) ~ age + sex,
                         time = 200, cause = "pcm", link = "identity",
                         model.censoring = "stratified",
                         formula.censoring = ~ sex, data = mgus2)
# multiple time points
cuminct2 <- cumincglm(Surv(etime, event) ~ age + sex,
         time = c(50, 100, 200), cause = "pcm", link = "identity",
         model.censoring = "independent", data = mgus2)
 cuminct3 <- cumincglm(Surv(etime, event) ~ age + tve(sex),
         time = c(50, 100, 200), cause = "pcm", link = "identity",
         model.censoring = "independent", data = mgus2)