Causal inference series
This approach serves you poorly because it does not reflect how we expect you to do research
I’d like to briefly review 1-4, and focus mainly on the “statistical gap”, number 5. Hopefully to help “what model should I use?”
If you are interested, a lot of my own research is about 6.
A directed acyclic graph (or DAG or just graph) conveys our assumptions about the mechanisms that gave rise to the observations, e.g.,
This is a functional causal model \(\{F_V:pa(V)\times U_V\to V\mid V\in\mathcal{V}\}\), e.g., \(y = F_Y(x, u, \varepsilon_Y)\).
the variable \(Y\) if \(X\) were intervened upon to have value 1.
What causal parameter did you choose?
Some options:
\[ \frac{P\{Y(1) = 1\}/P\{Y(1) = 0\}}{P\{Y(0) = 1\}/P\{Y(0) = 0\}} \]
The answer is complicated, see Colnet et al. (2023) Risk ratio, odds ratio, risk difference… Which causal measure is easier to generalize?
Some key tips:
Now consider the real world data. Suppose we use the conscription register in Sweden to answer this question. The observed data include
Draw the DAG for the data generating mechanism
Is the effect of interest identifiable from these data?
Rules of thumb:
Fear not, there are algorithms that, given a DAG and an effect, determine whether it is identified and if so, give the statistical estimand (Tian and Pearl 2002 -)
If the effect is identifiable, the estimand is some variation on the g-formula:
\[ E\{Y(a)\} = \sum_x E(Y | A = a, X = x) P(X = x | A = a). \]
This is the statistical gap
True or false?