Skip to contents

Get regression standardized estimates from a glm

Usage

standardize_glm(
  formula,
  data,
  values,
  clusterid,
  matched_density_cases,
  matched_density_controls,
  matching_variable,
  p_population,
  case_control = FALSE,
  ci_level = 0.95,
  ci_type = "plain",
  contrasts = NULL,
  family = "gaussian",
  reference = NULL,
  transforms = NULL
)

Arguments

formula

The formula which is used to fit the model for the outcome.

data

The data.

values

A named list or data.frame specifying the variables and values at which marginal means of the outcome will be estimated.

clusterid

An optional string containing the name of a cluster identification variable when data are clustered.

matched_density_cases

A function of the matching variable. The probability (or density) of the matched variable among the cases.

matched_density_controls

A function of the matching variable. The probability (or density) of the matched variable among the controls.

matching_variable

The matching variable extracted from the data set.

p_population

Specifies the incidence in the population when case_control=TRUE.

case_control

Whether the data comes from a case-control study.

ci_level

Coverage probability of confidence intervals.

ci_type

A string, indicating the type of confidence intervals. Either "plain", which gives untransformed intervals, or "log", which gives log-transformed intervals.

contrasts

A vector of contrasts in the following format: If set to "difference" or "ratio", then \(\psi(x)-\psi(x_0)\) or \(\psi(x) / \psi(x_0)\) are constructed, where \(x_0\) is a reference level specified by the reference argument. Has to be NULL if no references are specified.

family

The family argument which is used to fit the glm model for the outcome.

reference

A vector of reference levels in the following format: If contrasts is not NULL, the desired reference level(s). This must be a vector or list the same length as contrasts, and if not named, it is assumed that the order is as specified in contrasts.

transforms

A vector of transforms in the following format: If set to "log", "logit", or "odds", the standardized mean \(\theta(x)\) is transformed into \(\psi(x)=\log\{\theta(x)\}\), \(\psi(x)=\log[\theta(x)/\{1-\theta(x)\}]\), or \(\psi(x)=\theta(x)/\{1-\theta(x)\}\), respectively. If the vector is NULL, then \(\psi(x)=\theta(x)\).

Value

An object of class std_glm. This is basically a list with components estimates and covariance estimates in res. Results for transformations, contrasts, references are stored in res_contrasts. Obtain numeric results in a data frame with the tidy function.

Details

standardize_glm performs regression standardization in generalized linear models, at specified values of the exposure, over the sample covariate distribution. Let \(Y\), \(X\), and \(Z\) be the outcome, the exposure, and a vector of covariates, respectively. standardize_glm uses a fitted generalized linear model to estimate the standardized mean \(\theta(x)=E\{E(Y|X=x,Z)\}\), where \(x\) is a specific value of \(X\), and the outer expectation is over the marginal distribution of \(Z\).

References

Rothman K.J., Greenland S., Lash T.L. (2008). Modern Epidemiology, 3rd edition. Lippincott, Williams & Wilkins.

Sjölander A. (2016). Regression standardization with the R-package stdReg. European Journal of Epidemiology 31(6), 563-574.

Sjölander A. (2016). Estimation of causal effect measures with the R-package stdReg. European Journal of Epidemiology 33(9), 847-858.

Examples


# basic example
# needs to correctly specify the outcome model and no unmeasered confounders
# (+ standard causal assunmptions)
set.seed(6)
n <- 100
Z <- rnorm(n)
X <- rnorm(n, mean = Z)
Y <- rbinom(n, 1, prob = (1 + exp(X + Z))^(-1))
dd <- data.frame(Z, X, Y)
x <- standardize_glm(
  formula = Y ~ X * Z,
  family = "binomial",
  data = dd,
  values = list(X = 0:1),
  contrasts = c("difference", "ratio"),
  reference = 0
)
x
#> Outcome formula: Y ~ X * Z
#> <environment: 0x63e9486c66c0>
#> Outcome family: quasibinomial 
#> Outcome link function: logit 
#> Exposure:  X 
#> 
#> Tables: 
#>   X Estimate Std.Error lower.0.95 upper.0.95
#> 1 0    0.519    0.0615      0.399      0.640
#> 2 1    0.391    0.0882      0.218      0.563
#> 
#> Reference level:  X = 0 
#> Contrast:  difference 
#>   X Estimate Std.Error lower.0.95 upper.0.95
#> 1 0    0.000    0.0000      0.000    0.00000
#> 2 1   -0.129    0.0638     -0.254   -0.00353
#> 
#> Reference level:  X = 0 
#> Contrast:  ratio 
#>   X Estimate Std.Error lower.0.95 upper.0.95
#> 1 0    1.000     0.000      1.000      1.000
#> 2 1    0.752     0.126      0.505      0.999
#> 
# different transformations of causal effects

# example from Sjölander (2016) with case-control data
# here the matching variable needs to be passed as an argument
singapore <- AF::singapore
#> Registered S3 methods overwritten by 'stdReg':
#>   method                   from   
#>   summary.parfrailty       stdReg2
#>   print.summary.parfrailty stdReg2
Mi <- singapore$Age
m <- mean(Mi)
s <- sd(Mi)
d <- 5
standardize_glm(
  formula = Oesophagealcancer ~ (Everhotbev + Age + Dial + Samsu + Cigs)^2,
  family = binomial, data = singapore,
  values = list(Everhotbev = 0:1), clusterid = "Set",
  case_control = TRUE,
  matched_density_cases = function(x) dnorm(x, m, s),
  matched_density_controls = function(x) dnorm(x, m - d, s),
  matching_variable = Mi,
  p_population = 19.3 / 100000
)
#> Warning: case_control = TRUE may not give reasonable results for the variance with clustering
#> Outcome formula: Oesophagealcancer ~ (Everhotbev + Age + Dial + Samsu + Cigs)^2
#> <environment: 0x63e9486c66c0>
#> Outcome family: quasibinomial 
#> Outcome link function: logit 
#> Exposure:  Everhotbev 
#> 
#> Tables: 
#>   Everhotbev Estimate Std.Error lower.0.95 upper.0.95
#> 1          0 0.000128  1.69e-05   9.48e-05   0.000161
#> 2          1 0.000570  2.19e-04   1.41e-04   0.000999
#> 

# multiple exposures
set.seed(7)
n <- 100
Z <- rnorm(n)
X1 <- rnorm(n, mean = Z)
X2 <- rnorm(n)
Y <- rbinom(n, 1, prob = (1 + exp(X1 + X2 + Z))^(-1))
dd <- data.frame(Z, X1, X2, Y)
x <- standardize_glm(
  formula = Y ~ X1 + X2 + Z,
  family = "binomial",
  data = dd, values = list(X1 = 0:1, X2 = 0:1),
  contrasts = c("difference", "ratio"),
  reference = c(X1 = 0, X2 = 0)
)
x
#> Outcome formula: Y ~ X1 + X2 + Z
#> <environment: 0x63e9486c66c0>
#> Outcome family: quasibinomial 
#> Outcome link function: logit 
#> Exposure:  X1, X2 
#> 
#> Tables: 
#>   X1 X2 Estimate Std.Error lower.0.95 upper.0.95
#> 1  0  0    0.419    0.0576     0.3058      0.532
#> 2  1  0    0.252    0.0740     0.1074      0.397
#> 3  0  1    0.273    0.0690     0.1382      0.409
#> 4  1  1    0.146    0.0631     0.0221      0.269
#> 
#> Reference level:  X1 = 0; X2 = 0 
#> Contrast:  difference 
#>   X1 X2 Estimate Std.Error lower.0.95 upper.0.95
#> 1  0  0    0.000    0.0000      0.000     0.0000
#> 2  1  0   -0.166    0.0590     -0.282    -0.0509
#> 3  0  1   -0.145    0.0458     -0.235    -0.0557
#> 4  1  1   -0.273    0.0581     -0.387    -0.1593
#> 
#> Reference level:  X1 = 0; X2 = 0 
#> Contrast:  ratio 
#>   X1 X2 Estimate Std.Error lower.0.95 upper.0.95
#> 1  0  0    1.000     0.000      1.000      1.000
#> 2  1  0    0.603     0.141      0.327      0.878
#> 3  0  1    0.653     0.114      0.430      0.876
#> 4  1  1    0.348     0.131      0.091      0.605
#> 
tidy(x)
#>    X1 X2   Estimate  Std.Error  lower.0.95  upper.0.95   contrast transform
#> 1   0  0  0.4187976 0.05763610  0.30583295  0.53176230       none  identity
#> 2   1  0  0.2523918 0.07395458  0.10744346  0.39734008       none  identity
#> 3   0  1  0.2733894 0.06899362  0.13816443  0.40861445       none  identity
#> 4   1  1  0.1456809 0.06306346  0.02207877  0.26928300       none  identity
#> 5   0  0  0.0000000 0.00000000  0.00000000  0.00000000 difference  identity
#> 6   1  0 -0.1664059 0.05895745 -0.28196033 -0.05085138 difference  identity
#> 7   0  1 -0.1454082 0.04579033 -0.23515558 -0.05566080 difference  identity
#> 8   1  1 -0.2731167 0.05806770 -0.38692735 -0.15930614 difference  identity
#> 9   0  0  1.0000000 0.00000000  1.00000000  1.00000000      ratio  identity
#> 10  1  0  0.6026581 0.14070801  0.32687543  0.87844071      ratio  identity
#> 11  0  1  0.6527961 0.11372894  0.42989144  0.87570068      ratio  identity
#> 12  1  1  0.3478551 0.13106574  0.09097094  0.60473922      ratio  identity

# continuous exposure
set.seed(2)
n <- 100
Z <- rnorm(n)
X <- rnorm(n, mean = Z)
Y <- rnorm(n, mean = X + Z + 0.1 * X^2)
dd <- data.frame(Z, X, Y)
x <- standardize_glm(
  formula = Y ~ X * Z,
  family = "gaussian",
  data = dd,
  values = list(X = seq(-1, 1, 0.1))
)

# plot standardized mean as a function of x
plot(x)

# plot standardized mean - standardized mean at x = 0 as a function of x
plot(x, contrast = "difference", reference = 0)