Making Sense of Sensitivity: Extending Omitted Variable Bias
Author
Vincent Arel-Bundock
Published
January 22, 2021
Cinelli and Hazlett (2020) ask: How strong do omitted confounders have to be in order to change our coefficient estimate by q%? To answer this question, we start with a familiar setup, where we want to estimate the effect of treatment \(D\) on outcome \(Y\) in the presence of confounder \(Z\). The true model is:
\[
Y = \tau D + X\beta + \gamma Z + \varepsilon
\]
The omitted variable bias formula is well-known in the ordinary least squares context:
where \(\hat{\tau}_{res}\) is the observed estimate; \(\hat{\tau}\) is the desired estimate; \(\hat{\gamma}\) is a measure of the association between the omitted \(Z\) and \(Y\) (“impact”); and \(\hat{\delta}\) is a measure of the association between the omitted \(Z\) and the treatment \(D\) (“imbalance”).
The equivalence above can be seen in a simple simulation:
library(modelsummary)
`modelsummary` 2.0.0 now uses `tinytable` as its default table-drawing
backend. Learn more at: https://vincentarelbundock.github.io/tinytable/
Revert to `kableExtra` for one session:
options(modelsummary_factory_default = 'kableExtra')
options(modelsummary_factory_latex = 'kableExtra')
options(modelsummary_factory_html = 'kableExtra')
Silence this message forever:
config_modelsummary(startup_message = FALSE)
N =10000Z =rbinom(N, 1, prob = .5)D =rbinom(N, 1, prob = .8- .6* Z)Y =1* D +3* Z +rnorm(N)mod =list("Correct"=lm(Y ~ D + Z),"Confounded"=lm(Y ~ D),"Auxiliary"=lm(Z ~ D))coef(mod$Confounded)["D"]
Knowing the signs of the bias components can be useful, but the magnitudes of \(\gamma\) and \(\delta\) obviously matter as well. Moreover, the omitted variable bias formula with one confounder is of limited use when several confounders act together, possibly in non-linear fashion.
To assess how strong all omitted confounders need to be to overturn our conclusions, Cinelli and Hazlett (2020) recommend reparameterizing the bias in terms of partial \(R^2\). They show that our simple expression of the bias can be generalized and expressed as:
where \(R^2_{Y\sim Z|D,X}\) et al. represent partial \(R^2\) values.
The above equation would be sufficient to conduct a full sensitivity analysis, but it is often convenient to report a single “Robustness Value”. To simplify the interpretation, the authors consider a critical case where the strength of the impact (effect of Z on Y) and the strength of the imbalance (effect of Z on D) are equal. This allows them to define a simple Robustness Value:
In this equation, \(f_q:=q|f_{Y\sim D|X}|\), where \(f_{Y\sim D|X}\) represents the partial Cohen’s \(f\) of the treatment with the outcome,1 and \(q\) is “the proportion of reduction q on the treatment coefficient which would be deemed problematic.”
The interpretation is quite straightforward:
“Confounders that explain \(RV_q\)% both of the treatment and of the outcome are sufficiently strong to change the point estimate in problematic ways, whereas confounders with neither association greater than \(RV_q\)% are not.”
If \(RV_q\) is close to 1, the estimate can sustain confounding: the counfounders would need to explain nearly 100% of both the treatment and the outcome to be problematic. In contrast, if \(RV_q\) is close to 0, then our estimate cannot sustain confounding.
Sensitivity Analysis to Unobserved Confounding
Model Formula: Y ~ D
Null hypothesis: q = 0.25 and reduce = TRUE
Unadjusted Estimates of ' D ':
Coef. estimate: -0.84238
Standard Error: 0.03087
t-value: -6.82137
Sensitivity Statistics:
Partial R2 of treatment with outcome: 0.0693
Robustness Value, q = 0.25 : 0.06593
Robustness Value, q = 0.25 alpha = 0.05 : 0.04745
For more information, check summary.
As stated above, \(RV_q\) characterizes the special, critical case where the impact and imbalance are equal. To see what happens for different combinations of impact and imbalance, the sensemakr package allows us to draw nice contour plots. The red line shows us the combinations of confounding magnitude that would allow a change of 25% in the estimate (as determined by the q argument in the sensemakr call above).
plot(s)
References
Cinelli, Carlos, and Chad Hazlett. 2020. “Making Sense of Sensitivity: Extending Omitted Variable Bias.”Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82 (1): 39–67. https://doi.org/10.1111/rssb.12348.
Footnotes
This can be computed by dividing coefficient t-value by \(\sqrt{df}\)↩︎