Get regression standardized estimates from a glm

Usage

standardize_glm(
  formula,
  data,
  values,
  clusterid,
  matched_density_cases,
  matched_density_controls,
  matching_variable,
  p_population,
  case_control = FALSE,
  ci_level = 0.95,
  ci_type = "plain",
  contrasts = NULL,
  family = "gaussian",
  reference = NULL,
  transforms = NULL
)

Arguments

formula: The formula which is used to fit the model for the outcome.
data: The data.
values: A named list or data.frame specifying the variables and values at which marginal means of the outcome will be estimated.
clusterid: An optional string containing the name of a cluster identification variable when data are clustered.
matched_density_cases: A function of the matching variable. The probability (or density) of the matched variable among the cases.
matched_density_controls: A function of the matching variable. The probability (or density) of the matched variable among the controls.
matching_variable: The matching variable extracted from the data set.
p_population: Specifies the incidence in the population when case_control=TRUE.
case_control: Whether the data comes from a case-control study.
ci_level: Coverage probability of confidence intervals.
ci_type: A string, indicating the type of confidence intervals. Either "plain", which gives untransformed intervals, or "log", which gives log-transformed intervals.
contrasts: A vector of contrasts in the following format: If set to "difference" or "ratio", then \(\psi(x)-\psi(x_0)\) or \(\psi(x) / \psi(x_0)\) are constructed, where \(x_0\) is a reference level specified by the reference argument. Has to be NULL if no references are specified.
family: The family argument which is used to fit the glm model for the outcome.
reference: A vector of reference levels in the following format: If contrasts is not NULL, the desired reference level(s). This must be a vector or list the same length as contrasts, and if not named, it is assumed that the order is as specified in contrasts.
transforms: A vector of transforms in the following format: If set to "log", "logit", or "odds", the standardized mean \(\theta(x)\) is transformed into \(\psi(x)=\log\{\theta(x)\}\), \(\psi(x)=\log[\theta(x)/\{1-\theta(x)\}]\), or \(\psi(x)=\theta(x)/\{1-\theta(x)\}\), respectively. If the vector is NULL, then \(\psi(x)=\theta(x)\).

Value

An object of class std_glm. This is basically a list with components estimates and covariance estimates in res. Results for transformations, contrasts, references are stored in res_contrasts. Obtain numeric results in a data frame with the tidy function.

Details

standardize_glm performs regression standardization in generalized linear models, at specified values of the exposure, over the sample covariate distribution. Let \(Y\), \(X\), and \(Z\) be the outcome, the exposure, and a vector of covariates, respectively. standardize_glm uses a fitted generalized linear model to estimate the standardized mean \(\theta(x)=E\{E(Y|X=x,Z)\}\), where \(x\) is a specific value of \(X\), and the outer expectation is over the marginal distribution of \(Z\).

References

Rothman K.J., Greenland S., Lash T.L. (2008). Modern Epidemiology, 3rd edition. Lippincott, Williams & Wilkins.

Sjölander A. (2016). Regression standardization with the R-package stdReg. European Journal of Epidemiology 31(6), 563-574.

Sjölander A. (2016). Estimation of causal effect measures with the R-package stdReg. European Journal of Epidemiology 33(9), 847-858.

Examples


# basic example
# needs to correctly specify the outcome model and no unmeasered confounders
# (+ standard causal assunmptions)
set.seed(6)
n <- 100
Z <- rnorm(n)
X <- rnorm(n, mean = Z)
Y <- rbinom(n, 1, prob = (1 + exp(X + Z))^(-1))
dd <- data.frame(Z, X, Y)
x <- standardize_glm(
  formula = Y ~ X * Z,
  family = "binomial",
  data = dd,
  values = list(X = 0:1),
  contrasts = c("difference", "ratio"),
  reference = 0
)
x
#> Outcome formula: Y ~ X * Z
#> <environment: 0x63e9486c66c0>
#> Outcome family: quasibinomial 
#> Outcome link function: logit 
#> Exposure:  X 
#> 
#> Tables: 
#>   X Estimate Std.Error lower.0.95 upper.0.95
#> 1 0    0.519    0.0615      0.399      0.640
#> 2 1    0.391    0.0882      0.218      0.563
#> 
#> Reference level:  X = 0 
#> Contrast:  difference 
#>   X Estimate Std.Error lower.0.95 upper.0.95
#> 1 0    0.000    0.0000      0.000    0.00000
#> 2 1   -0.129    0.0638     -0.254   -0.00353
#> 
#> Reference level:  X = 0 
#> Contrast:  ratio 
#>   X Estimate Std.Error lower.0.95 upper.0.95
#> 1 0    1.000     0.000      1.000      1.000
#> 2 1    0.752     0.126      0.505      0.999
#> 
# different transformations of causal effects

# example from Sjölander (2016) with case-control data
# here the matching variable needs to be passed as an argument
singapore <- AF::singapore
#> Registered S3 methods overwritten by 'stdReg':
#>   method                   from   
#>   summary.parfrailty       stdReg2
#>   print.summary.parfrailty stdReg2
Mi <- singapore$Age
m <- mean(Mi)
s <- sd(Mi)
d <- 5
standardize_glm(
  formula = Oesophagealcancer ~ (Everhotbev + Age + Dial + Samsu + Cigs)^2,
  family = binomial, data = singapore,
  values = list(Everhotbev = 0:1), clusterid = "Set",
  case_control = TRUE,
  matched_density_cases = function(x) dnorm(x, m, s),
  matched_density_controls = function(x) dnorm(x, m - d, s),
  matching_variable = Mi,
  p_population = 19.3 / 100000
)
#> Warning: case_control = TRUE may not give reasonable results for the variance with clustering
#> Outcome formula: Oesophagealcancer ~ (Everhotbev + Age + Dial + Samsu + Cigs)^2
#> <environment: 0x63e9486c66c0>
#> Outcome family: quasibinomial 
#> Outcome link function: logit 
#> Exposure:  Everhotbev 
#> 
#> Tables: 
#>   Everhotbev Estimate Std.Error lower.0.95 upper.0.95
#> 1          0 0.000128  1.69e-05   9.48e-05   0.000161
#> 2          1 0.000570  2.19e-04   1.41e-04   0.000999
#> 

# multiple exposures
set.seed(7)
n <- 100
Z <- rnorm(n)
X1 <- rnorm(n, mean = Z)
X2 <- rnorm(n)
Y <- rbinom(n, 1, prob = (1 + exp(X1 + X2 + Z))^(-1))
dd <- data.frame(Z, X1, X2, Y)
x <- standardize_glm(
  formula = Y ~ X1 + X2 + Z,
  family = "binomial",
  data = dd, values = list(X1 = 0:1, X2 = 0:1),
  contrasts = c("difference", "ratio"),
  reference = c(X1 = 0, X2 = 0)
)
x
#> Outcome formula: Y ~ X1 + X2 + Z
#> <environment: 0x63e9486c66c0>
#> Outcome family: quasibinomial 
#> Outcome link function: logit 
#> Exposure:  X1, X2 
#> 
#> Tables: 
#>   X1 X2 Estimate Std.Error lower.0.95 upper.0.95
#> 1  0  0    0.419    0.0576     0.3058      0.532
#> 2  1  0    0.252    0.0740     0.1074      0.397
#> 3  0  1    0.273    0.0690     0.1382      0.409
#> 4  1  1    0.146    0.0631     0.0221      0.269
#> 
#> Reference level:  X1 = 0; X2 = 0 
#> Contrast:  difference 
#>   X1 X2 Estimate Std.Error lower.0.95 upper.0.95
#> 1  0  0    0.000    0.0000      0.000     0.0000
#> 2  1  0   -0.166    0.0590     -0.282    -0.0509
#> 3  0  1   -0.145    0.0458     -0.235    -0.0557
#> 4  1  1   -0.273    0.0581     -0.387    -0.1593
#> 
#> Reference level:  X1 = 0; X2 = 0 
#> Contrast:  ratio 
#>   X1 X2 Estimate Std.Error lower.0.95 upper.0.95
#> 1  0  0    1.000     0.000      1.000      1.000
#> 2  1  0    0.603     0.141      0.327      0.878
#> 3  0  1    0.653     0.114      0.430      0.876
#> 4  1  1    0.348     0.131      0.091      0.605
#> 
tidy(x)
#>    X1 X2   Estimate  Std.Error  lower.0.95  upper.0.95   contrast transform
#> 1   0  0  0.4187976 0.05763610  0.30583295  0.53176230       none  identity
#> 2   1  0  0.2523918 0.07395458  0.10744346  0.39734008       none  identity
#> 3   0  1  0.2733894 0.06899362  0.13816443  0.40861445       none  identity
#> 4   1  1  0.1456809 0.06306346  0.02207877  0.26928300       none  identity
#> 5   0  0  0.0000000 0.00000000  0.00000000  0.00000000 difference  identity
#> 6   1  0 -0.1664059 0.05895745 -0.28196033 -0.05085138 difference  identity
#> 7   0  1 -0.1454082 0.04579033 -0.23515558 -0.05566080 difference  identity
#> 8   1  1 -0.2731167 0.05806770 -0.38692735 -0.15930614 difference  identity
#> 9   0  0  1.0000000 0.00000000  1.00000000  1.00000000      ratio  identity
#> 10  1  0  0.6026581 0.14070801  0.32687543  0.87844071      ratio  identity
#> 11  0  1  0.6527961 0.11372894  0.42989144  0.87570068      ratio  identity
#> 12  1  1  0.3478551 0.13106574  0.09097094  0.60473922      ratio  identity

# continuous exposure
set.seed(2)
n <- 100
Z <- rnorm(n)
X <- rnorm(n, mean = Z)
Y <- rnorm(n, mean = X + Z + 0.1 * X^2)
dd <- data.frame(Z, X, Y)
x <- standardize_glm(
  formula = Y ~ X * Z,
  family = "gaussian",
  data = dd,
  values = list(X = seq(-1, 1, 0.1))
)

# plot standardized mean as a function of x
plot(x)

# plot standardized mean - standardized mean at x = 0 as a function of x
plot(x, contrast = "difference", reference = 0)