4 Difference-in-Differences
4.1 What is Panel Data?
Panel data describes the setting in which we have repeated observations over time for the same units (e.g., people, firms, counties, etc.). Such data often present an opportunity to estimate causal effects more convincingly than in a purely cross-sectional setting, although that’s certainly not always the case. I’d much rather read a strong analysis of “lesser” data than a poor analysis of “better” data. But all else equal, panel data tend to contain more information and more dimensions of variation than we see in cross-sectional data, and thus more opportunities for causal inference.
Slightly more formally, we observe some outcome \(y\) for units \(i=1,...,N\) and over time periods \(t=1,...,T\). We denote the outcome for a given unit and time by \(y_{it}\). Keeping with our potential outcomes framework and notation, let’s assume that some units receive treatment at some time \(t\), which we denote by the indicator \(1(D_{it}=1)\). We also observe some other time-varying characteristics for each unit, denoted \(W_{it}\).
4.2 What is Difference-in-Differences?
Difference-in-differences (DD) is an identification strategy that essentially attempts to predict the counterfactual for the treated group using the change in outcomes among the control group. Intuitively, we assume that the outcomes for those that ultimately received treatment would have evolved just as the outcomes for those that did not receive treatment (on average).
4.3 TWFE and Event Studies
Two-way fixed effects (TWFE) is essentially just a shorthand for a specific and very common regression specification that includes fixed effects for units \(i=1,...,N\) and time \(t=1,...,T\). I’ll denote these by \(\gamma_{i}\) and \(\gamma_{t}\), respectively. TWFE is more general than a 2x2 DD specification because it allows for intercept shifts for each units and for each time period, as opposed to shifts only among each treated/control group and treated/control period. However, this extra generality doesn’t really do much in the case of a single treated group and a single treatment time, since the overall group and overall treatment period dummies in the 2x2 case are effectively an average of the individual unit and individual time period fixed effects, respectively.
More formally, recall our original DD regression specification,
\(y_{it} = \alpha + \beta \times 1(Post) + \lambda \times 1(Treat) + \delta \times 1(Post) \times 1(Treat) + \varepsilon\). Here, we have a single treated group and a single treatment time, so we capture treatment time with a post-period indicator, treated group with a treatment indicator, and the interaction between the two variables is our treatment effect of interest. But we can generalize this to allow for dummies for each unit (rather than treated/control group) and similarly for each time period (rather than pre/post groups). This doesn’t change our treatment effect estimates, but it is “more efficient” (standard errors are going to be smaller). Doing so yields the following regression specification: \[y_{it} = \alpha + \delta D_{it} + \gamma_{i} + \gamma_{t} + \varepsilon,\]
where \(\gamma_{i}\) and \(\gamma_{t}\) denote a set of unit \(i\) and time period \(t\) dummy variables (or fixed effects), and where \(\delta\) captures our treatment effect of interest just as before.