Skip to contents

Get regression standardized doubly-robust estimates from a glm

Usage

standardize_glm_dr(
  formula_outcome,
  formula_exposure,
  data,
  values,
  ci_level = 0.95,
  ci_type = "plain",
  contrasts = NULL,
  family_outcome = "gaussian",
  family_exposure = "binomial",
  reference = NULL,
  transforms = NULL
)

Arguments

formula_outcome

The formula which is used to fit the glm model for the outcome.

formula_exposure

The formula which is used to fit the glm model for the exposure. If not NULL, a doubly robust estimator of the standardized estimator is used.

data

The data.

values

A named list or data.frame specifying the variables and values at which marginal means of the outcome will be estimated.

ci_level

Coverage probability of confidence intervals.

ci_type

A string, indicating the type of confidence intervals. Either "plain", which gives untransformed intervals, or "log", which gives log-transformed intervals.

contrasts

A vector of contrasts in the following format: If set to "difference" or "ratio", then \(\psi(x)-\psi(x_0)\) or \(\psi(x) / \psi(x_0)\) are constructed, where \(x_0\) is a reference level specified by the reference argument. Has to be NULL if no references are specified.

family_outcome

The family argument which is used to fit the glm model for the outcome.

family_exposure

The family argument which is used to fit the glm model for the exposure.

reference

A vector of reference levels in the following format: If contrasts is not NULL, the desired reference level(s). This must be a vector or list the same length as contrasts, and if not named, it is assumed that the order is as specified in contrasts.

transforms

A vector of transforms in the following format: If set to "log", "logit", or "odds", the standardized mean \(\theta(x)\) is transformed into \(\psi(x)=\log\{\theta(x)\}\), \(\psi(x)=\log[\theta(x)/\{1-\theta(x)\}]\), or \(\psi(x)=\theta(x)/\{1-\theta(x)\}\), respectively. If the vector is NULL, then \(\psi(x)=\theta(x)\).

Value

An object of class std_glm. Obtain numeric results in a data frame with the tidy.std_glm function. This is a list with the following components:

res_contrast

An unnamed list with one element for each of the requested contrasts. Each element is itself a list with the elements:

estimates

Estimated counterfactual means and standard errors for each exposure level

covariance

Estimated covariance matrix of counterfactual means

fit_outcome

The estimated regression model for the outcome

fit_exposure

The estimated exposure model

exposure_names

A character vector of the exposure variable names

est_table

Data.frame of the estimates of the contrast with inference

transform

The transform argument used for this contrast

contrast

The requested contrast type

reference

The reference level of the exposure

ci_type

Confidence interval type

ci_level

Confidence interval level

res

A named list with the elements:

estimates

Estimated counterfactual means and standard errors for each exposure level

covariance

Estimated covariance matrix of counterfactual means

fit_outcome

The estimated regression model for the outcome

fit_exposure

The estimated exposure model

exposure_names

A character vector of the exposure variable names

Details

standardize_glm_dr performs regression standardization in generalized linear models, see e.g., documentation for standardize_glm_dr. Specifically, this version uses a doubly robust estimator for standardization, meaning inference is valid when either the outcome regression or the exposure model is correctly specified and there is no unmeasured confounding.

References

Gabriel E.E., Sachs, M.C., Martinussen T., Waernbaum I., Goetghebeur E., Vansteelandt S., Sjölander A. (2024), Inverse probability of treatment weighting with generalized linear outcome models for doubly robust estimation. Statistics in Medicine, 43(3):534–547.

Examples


# doubly robust estimator
# needs to correctly specify either the outcome model or the exposure model
# for confounding
# NOTE: only works with binary exposures
data <- AF::clslowbwt
x <- standardize_glm_dr(
  formula_outcome = bwt ~ smoker * (race + age + lwt) + I(age^2) + I(lwt^2),
  formula_exposure = smoker ~ race * age * lwt + I(age^2) + I(lwt^2),
  family_outcome = "gaussian",
  family_exposure = "binomial",
  data = data,
  values = list(smoker = c(0, 1)), contrasts = "difference", reference = 0
)

set.seed(6)
n <- 100
Z <- rnorm(n)
X <- rbinom(n, 1, prob = (1 + exp(Z))^(-1))
Y <- rbinom(n, 1, prob = (1 + exp(as.numeric(X) + Z))^(-1))
dd <- data.frame(Z, X, Y)
x <- standardize_glm_dr(
  formula_outcome = Y ~ X * Z, formula_exposure = X ~ Z,
  family_outcome = "binomial",
  data = dd,
  values = list(X = 0:1), reference = 0,
  contrasts = c("difference"), transforms = c("odds")
)