Sensitivity Analysis for Unobserved Confounding
Outline
- Introduction
-
Problem Setup 2.1. Causal Graph 2.2. Model With and Without _Z_2.3. Strength of Z as a Confounder
-
Sensitivity Analysis 3.1. Goal 3.2. Robustness Value
- PySensemakr
- Conclusion
- Acknowledgements
- References
1. Introduction
The specter of unobserved Confounding (aka omitted variable bias) is a notorious problem in observational studies. In most observational studies, unless we can reasonably assume that treatment assignment is as-if random as in a natural experiment, we can never be truly certain that we controlled for all possible confounders in our model. As a result, our model estimates can be severely biased if we fail to control for an important confounder–and we wouldn't even know it since the unobserved confounder is, well, unobserved!
Given this problem, it is important to assess how sensitive our estimates are to possible sources of unobserved confounding. In other words, it is a helpful exercise to ask ourselves: how much unobserved confounding would there have to be for our estimates to drastically change (e.g., treatment effect no longer statistically significant)? Sensitivity analysis for unobserved confounding is an active area of research, and there are several approaches to tackling this problem. In this post, I will cover a simple linear method [1] based on the concept of partial R² that is widely applicable to a large spectrum of cases.
2. Problem Setup
2.1. Causal Graph
Let us assume that we have four variables:
- Y: outcome
- D: treatment
- X: observed confounder(s)
- Z: unobserved confounder(s)
This is a common setting in many Observational Studies where the researcher is interested in knowing whether the treatment of interest has an effect on the outcome after controlling for possible treatment-outcome confounders.
In our hypothetical setting, the relationship between these variables are such that X and Z both affect D and Y, but D has no effect on Y. In other words, we are describing a scenario where the true treatment effect is null. As will become clear in the next section, the purpose of sensitivity analysis is being able to reason about this treatment effect when we have no access to Z, as we normally won't since it's unobserved. Figure 1 visualizes our setup.
Figure 1: Problem Setup

2.2. Model With and Without Z
To demonstrate the problem that our unobserved Z can cause, I simulated some data in line with the problem setup described above. You can refer to this notebook for the details of the simulation.
Since Z would be unobserved in real life, the only model we can normally fit to data is Y~D+X. Let us see what results we get if we run that regression.
Based on these results, it seems like D has a statistically significant effect of 0.2686 (p<0.001) per one unit change on Y, which we know isn't true based on how we generated the data (no D effect).
Now, let's see what happens to our D estimate when we control for Z as well. (In real life, we of course won't be able to run this additional regression since Z is unobserved but our simulation setting allows us to peek behind the curtain into the true data generation process.)
As expected, controlling for Z correctly removes the D effect by shrinking the estimate towards zero and giving us a p-value that is no longer statistically significant at the