3 Instrumental Variables

Want to solve your endogeneity problem with with IV? Well, IV comes with its own problems too. For one, it’s biased (although that can be fixed with some assumptions). It’s also a very sensitive approach. IV used to be the darling of empirical work, but not so much anymore.

I’m assuming you have some instrument(s) in mind already, at least as many as you have endogenous variables.

3.1 Does IV do anything?

It would be great if we could test for whether we need IV or not. While we can’t really do that, we can at least see how different our IV results might be relative to OLS (assuming we have some decent instruments already).

An easy way to assess the need for IV is to simply test whether your IV results are sufficiently different from OLS. That’s the spirit of the Hausman test. The original test introduced in Hausman (1978) is not specific to endogeneity…it’s a more general misspecification test, comparing the estimates from one estimator (that is efficient under the null) to that of another estimator that is consistent but inefficient under the null. The test in the context of IV is also referred to as the Durbin-Wu-Hausman test, due to the series of papers pre-dating Hausman (1978), including Durbin (1954), Wu (1973), and Wu (1974).

This test is easily implemented as an “artificial” or “augmented” regression. Denoting our outcome by \(y\), our instruments by \(z\), our endogeous variables by \(x_{1}\), and other exogenous variables by \(x_{2}\), we first regress each of the variables in \(x_{1}\) on \(x_{2}\) and \(z\). Then we take the residuals from those regressions, denoted \(\hat{v}\), and include them in the standard OLS regression of \(y\) on \(x_{1}\), \(x_{2}\), and \(\hat{v}\).

The biggest barrier to this test in practice is that it assumes we have a valid and strong set of instruments, \(z\). Since that’s usually the biggest barrier to causal inference with IV, it becomes a major practical problem. For example, if you reject the null and conclude that estimates from OLS and IV are statistically different, can you be sure that the difference is “real” and not a statistical artifact of weak or invalid instruments? The whole process becomes pretty circular.

3.2 IV Flowchart

Most of the nodes in the diagram below are clickable, which will take you to another page with much more detail on that specific issue. In practice, empirical work is not so linear, and there is typically a lot of recirculation among these steps. For example, you may have 2 endogenous variables and 4 instruments. You may find that 1 or 2 of those instruments are weak, and as you learn this information, you are constantly recirculating in stage 1. You may settle on 3 instruments that seem to work well. Then, in Stage 2, you may find that your results are very sensitive with those instruments, but less sensitive when relying on only 2 of those 3 instruments. So now you go back to step 1, evaluate just those 2 instruments, etc.

The point is that empirical work in practice is messy. Ideally, we could set out our plan in advance and proceed accordingly, but there are some things we just can’t know until we see the data. All we can do is work through the process in good faith, assessing the quality of our empirical work based on sound statistics and econometrics.

One final note. If you’re accessing this on an android mobile device, the flowcharts are going to look a little odd (probably huge). This is a known issue in rendering these types of diagrams. See this closed issue on GitHub and these unanswered posts on StackOverflow. If anyone has any suggestions for how to have this render on an android mobile browser, please let me know. Otherwise, happy instrumenting!