Skip to contents
library(igraph)
#> 
#> Attaching package: 'igraph'
#> The following objects are masked from 'package:stats':
#> 
#>     decompose, spectrum
#> The following object is masked from 'package:base':
#> 
#>     union
library(causaloptim)
library(rcdd)
#> If you want correct answers, use rational arithmetic.
#> See the Warnings sections in help pages for
#>     functions that do computational geometry.

Introduction

When a causal estimand is not identifiable, one can compute sharp bounds by optimizing over all latent distributions compatible with the observed data. Sachs et al. (2023) reparameterize the causal model via response functions, yielding a local encoding: a linear program (LP) in |R||R| variables. Shridharan and Iyengar (2023) propose an alternative hyperarc encoding with |H||R||H| \le |R| variables that gives the same sharp bounds.

This vignette accompanies Markham et al. (2026) and works through the running example from that paper: the canonical instrumental variable DAG with VL={X1}V_L = \{X_1\}, VR={X2,X3}V_R = \{X_2, X_3\}, and an internal edge X2X3X_2 \to X_3. We enumerate HH directly—in topological order, without ever materialising HfullH_\text{full}—display all 16 candidate hyperarcs (marking the 4 invalid ones), and verify that both encodings give the same sharp bounds.

Setup

g <- graph_from_literal(X1 -+ X2, X2 -+ X3, Ur -+ X2:X3) |> initialize_graph()

left_vars  <- names(V(g)[leftside == 1 & latent == 0])
right_vars <- names(V(g)[leftside == 0 & latent == 0])

The running example is the confounded DAG below, with VL={X1}V_L = \{X_1\} (the instrument), VR={X2,X3}V_R = \{X_2, X_3\} (outcomes), a latent common cause U2U_2 of X2X_2 and X3X_3, and a latent parent U1U_1 of X1X_1.

Confounded DAG. Dashed nodes ($U_1$, $U_2$) are latent.

Confounded DAG. Dashed nodes (U1U_1, U2U_2) are latent.

Local encoding: |R|=16|R| = 16

card_R <- prod(sapply(right_vars, function(j) {
  nv <- V(g)[j]$nvals
  pa <- setdiff(names(neighbors(g, j, mode = "in")), "Ur")
  if (length(pa) == 0) nv else nv ^ prod(V(g)[pa]$nvals)
}))
cat("|R| =", card_R, "\n")
#> |R| = 16

## or

cmg <- create_causalmodel(g, prob.form = list(out = c("X2", "X3"), cond = "X1"))
ncol(cmg$counterfactual_constraints$numeric$R)
#> [1] 16

Direct enumeration of valid hyperarcs

A valid hyperarc is a map h:𝒳VL𝒳VRh : \mathcal{X}_{V_L} \to \mathcal{X}_{V_R} whose output at each node jVRj \in V_R is constant within every equivalence class of left-side configurations that share the same values of jj’s observed parents. We enumerate HH by processing VRV_R in topological order: at each node we determine those classes from the assignments already made, then independently choose a constant value per class.

enumerate_hyperarcs <- function(graph) {
  obs   <- V(graph)[latent == 0]
  left  <- obs[obs$leftside == 1]
  right <- obs[obs$leftside == 0]

  lvects <- as.matrix(do.call(expand.grid,
    lapply(left$nvals, function(n) 0:(n - 1))
  ))
  colnames(lvects) <- names(left)
  nl <- nrow(lvects)

  right_topo <- names(topo_sort(graph))[
    names(topo_sort(graph)) %in% names(right)
  ]

  H <- list(matrix(nrow = nl, ncol = 0, dimnames = list(NULL, character(0))))

  for (j in right_topo) {
    nv_j   <- V(graph)[j]$nvals
    pa_obs <- names(neighbors(graph, j, mode = "in")) |>
                intersect(c(names(left), names(right)))

    H_new <- list()
    for (h in H) {
      if (length(pa_obs) == 0) {
        groups <- rep(1L, nl)
      } else {
        pmat <- sapply(pa_obs, function(p)
          if (p %in% names(left)) lvects[, p] else h[, p]
        )
        if (!is.matrix(pmat)) pmat <- matrix(pmat, ncol = 1)
        groups <- as.integer(factor(apply(pmat, 1, paste, collapse = "_")))
      }
      combos <- as.matrix(expand.grid(rep(list(0:(nv_j - 1)), max(groups))))
      for (i in seq_len(nrow(combos))) {
        vals  <- as.integer(combos[i, ][groups])
        h_new <- cbind(h, matrix(vals, ncol = 1, dimnames = list(NULL, j)))
        H_new <- c(H_new, list(h_new))
      }
    }
    H <- H_new
  }
  lapply(H, function(h) cbind(as.data.frame(lvects), as.data.frame(h)))
}
H <- enumerate_hyperarcs(g)
cat("|H| =", length(H), "\n")
#> |H| = 12

All 16 candidate hyperarcs (elements of HfullH_\text{full}), with validity indicated, are:

All 16 candidate hyperarcs. The 4 marked ✗ are invalid (X2 constant but X3 disagrees).
h X2(X1=0) X2(X1=1) X3(X1=0) X3(X1=1) Valid
1 0 0 0 0
2 1 0 0 0
3 0 0 1 0
4 1 0 1 0
5 0 1 0 0
6 1 1 0 0
7 0 1 1 0
8 1 1 1 0
9 0 0 0 1
10 1 0 0 1
11 0 0 1 1
12 1 0 1 1
13 0 1 0 1
14 1 1 0 1
15 0 1 1 1
16 1 1 1 1

The 4 invalid candidates (rows 3, 8, 9, 14) are exactly those in which X2X_2 takes the same value for both X1X_1 inputs yet X3X_3 differs. When X2X_2 is constant, the response function value at the unrealized X2X_2 input is observationally irrelevant: distinct elements of RR that disagree only there produce identical observable distributions. The local encoding still counts them as separate response functions (contributing 4 redundant elements to RR), while the hyperarc encoding omits them, giving |H|=12<|R|=16|H| = 12 < |R| = 16.

Note on constructing HH. Shridharan and Iyengar (2023) define a hyperarc hh as valid if RhR_h \neq \emptyset (their Definition 4) and give a locally checkable condition (their Theorem 5) for validity that avoids iterating over RR. HH can be constructed by iterating over all |𝒳VR||𝒳VL|=42=16|\mathcal{X}_{V_R}|^{|\mathcal{X}_{V_L}|} = 4^2 = 16 candidate maps and retain those satisfying the condition. In general, |𝒳VR||𝒳VL||\mathcal{X}_{V_R}|^{|\mathcal{X}_{V_L}|} can be much larger than |R||R| (when right-side nodes depend on only a subset of 𝒳VL\mathcal{X}_{V_L}), so iterating over the candidate set can be costly. The topological enumeration above avoids this: valid hyperarcs are built node by node in topological order, so no invalid candidate is ever constructed.

Verifying identical sharp bounds

Both RR and HH describe the same observable polytope, so they yield the same sharp bounds for any causal query. We verify this for P(X3=1do(X2=1))P(X_3 = 1 \mid \mathrm{do}(X_2 = 1)).

Local encoding bounds via causaloptim

mod          <- analyze_graph(g, constraints = NULL,
                              effectt = "p{X3(X2 = 1) = 1}")
local_result <- optimize_effect_2(mod)
local_result
#> lower bound =  
#> MAX {
#>   1 - p00_0 - p10_0 - p01_0,
#>   1 - p00_1 - p10_0 - p10_1 - p01_0,
#>   1 - p00_1 - p10_1 - p01_1,
#>   1 - p00_0 - p10_0 - p10_1 - p01_1
#> }
#> ----------------------------------------
#> upper bound =  
#> MIN {
#>   1 - p10_0,
#>   2 - p00_1 - p10_0 - p10_1 - p01_0,
#>   2 - p00_0 - p10_0 - p10_1 - p01_1,
#>   1 - p10_1
#> }
#> 
#> A function to compute the bounds for a given set of probabilities is available in the x$bounds_function element.

Hyperarc bounds

We build the constraint matrix mapping hyperarc weights to the observable conditional probabilities p(X2,X3X1)p(X_2, X_3 \mid X_1), choose the objective vectors for the query, and call rcdd::scdd for vertex enumeration.

The lower-bound objective selects hyperarcs under which X3=1X_3 = 1 is certain when X2=1X_2 = 1 (i.e., some X1X_1 value maps to (X2=1,X3=1)(X_2=1, X_3=1)). The upper-bound objective selects hyperarcs under which X3=0X_3 = 0 is impossible when X2=1X_2 = 1. These are the extreme valid coefficient assignments and yield sharp bounds.

rpaste <- function(x) apply(as.matrix(x), 1, paste, collapse = "_")

# All observable (X1, X2, X3) configurations
obs_configs <- as.matrix(do.call(expand.grid,
  lapply(V(g)[latent == 0]$nvals, function(n) 0:(n - 1))
))
colnames(obs_configs) <- names(V(g)[latent == 0])
Pc <- rpaste(obs_configs)

# p-parameter names p{X2}{X3}_{X1}
p_names <- apply(obs_configs, 1, function(r)
  sprintf("p%s_%s",
    paste(r[right_vars], collapse = ""),
    paste(r[left_vars],  collapse = "")))

# Constraint matrix: rows = (normalisation, observables), cols = hyperarcs
# Row 1: sum q_h = 1; Row obs+1: sum_{h: h(x1)=(x2,x3)} q_h = p(x2,x3|x1)
nl        <- nrow(H[[1]])
H_long    <- do.call(rbind, H)
H_strings <- matrix(rpaste(H_long), nrow = length(H), ncol = nl, byrow = TRUE)

Rmat <- 1.0 * Reduce(`|`, lapply(seq_len(nl), function(k)
  outer(Pc, H_strings[, k], FUN = "==")))
Rmat <- rbind(rep(1, length(H)), Rmat)

# Objective: 1 for hyperarcs consistent with X3=1 when X2=1
obj_lower <- sapply(H, function(h)  any(h$X2 == 1 & h$X3 == 1))
obj_upper <- sapply(H, function(h) !any(h$X2 == 1 & h$X3 == 0))

eval_objective <- function(p, y) {
  # Inlined from causaloptim's internal evaluate_objective (c1_num = rbind(1)):
  # y[1] is the constant term; y[-1] are coefficients for the p-parameters.
  const <- as.character(y[1])
  terms <- mapply(function(v, nm) {
    if (v ==  0)  ""
    else if (v ==  1) paste0(" + ", nm)
    else if (v == -1) paste0(" - ", nm)
    else if (v  >  0) paste0(" + ", v, nm)
    else              paste0(" - ", abs(v), nm)
  }, y[-1], p)
  expr <- paste0(const, paste(terms, collapse = ""))
  if (y[1] == 0 && nchar(expr) > 1)
    expr <- trimws(substr(expr, 4, nchar(expr)))
  trimws(expr)
}

enumerate_vertices <- function(Rmat, obj, p) {
  hrep  <- makeH(a1 = t(Rmat), b1 = 1.0 * obj)
  vrep  <- scdd(input = hrep)
  verts <- vrep$output[vrep$output[, 1] == 0 & vrep$output[, 2] == 1,
                       -c(1, 2), drop = FALSE]
  apply(verts, 1, function(y) eval_objective(p, y))
}

hyp_lower <- enumerate_vertices( Rmat,        obj_lower, p_names)
hyp_upper <- enumerate_vertices(-Rmat, -1.0 * obj_upper, p_names)

Symbolic comparison

The two encodings produce the same observable polytope, so vertex enumeration returns the same set of linear forms in the observed probabilities. optimize_effect_2 returns the individual vertex expressions in its expressions field, so we extract them directly for element-wise comparison.

local_lb <- sort(trimws(local_result$expressions$lower))
local_ub <- sort(trimws(local_result$expressions$upper))

hyp_lb   <- sort(trimws(hyp_lower))
hyp_ub   <- sort(trimws(hyp_upper))

cat("Lower-bound expressions identical:", setequal(local_lb, hyp_lb), "\n")
#> Lower-bound expressions identical: TRUE
cat("Upper-bound expressions identical:", setequal(local_ub, hyp_ub), "\n")
#> Upper-bound expressions identical: TRUE

The shared lower-bound expressions are:

cat(paste(local_lb, collapse = "\n"), "\n")
#> 1 - p00_0 - p10_0 - p01_0
#> 1 - p00_0 - p10_0 - p10_1 - p01_1
#> 1 - p00_1 - p10_0 - p10_1 - p01_0
#> 1 - p00_1 - p10_1 - p01_1

That is, the sharp lower bound on P(X3=1do(X2=1))P(X_3=1\mid\mathrm{do}(X_2=1)) is the maximum of these linear forms over the observed distribution (px2x3x1)(p_{x_2 x_3 \mid x_1}), and the two encodings produce exactly this same set.

References

  • Markham, A., Gabriel, E. E., and Sachs, M. C. (2026). Computational and algebraic questions in causal inference. In International Congress on Mathematical Software (ICMS), Lecture Notes in Computer Science. TODO: add DOI/URL

  • Sachs, M. C., Jonzon, G., Sjölander, A., and Gabriel, E. E. (2023). A general method for deriving tight symbolic bounds on causal effects. Journal of Computational and Graphical Statistics, 32(2), 567–576. https://doi.org/10.1080/10618600.2022.2071905

  • Shridharan, M. and Iyengar, G. (2023). Scalable computation of causal bounds. Journal of Machine Learning Research, 24(237), 1–35. http://jmlr.org/papers/v24/22-1081.html