Staggered Difference-in-Differences with Two-Way Fixed Effects and Interactions
09-30-2021
Recently, Wooldridge shared a working paper titled Two-Way Fixed Effects, the Two-Way Mundlak Regression, and Difference-in-Differences Estimators. He also shared a video, slides, and code to accompany to paper on his Twitter account.
The paper does a ton of stuff, and I will not attempt to go through or summarize it all. Instead, I will focus on just one part: the equivalence between an “extended” Two-Way Fixed Effects estimator and some recent alternative strategies for analyzing policy interventions in a Difference-in-Differences (DiD) framework.
As I noted in a previous post, there has been a lot of recent development in econometrics on the analysis of policy interventions using panel data. The “standard” approach for many analysts has been to use Two-Way Fixed Effects (TWFE). However, several authors have pointed out that in the presence of treatment effect heterogeneity – when treatment effect varies over time or by treatment “cohort” – TWFE can do weird stuff. In recent years, many papers have been written to explain exactly what quantities get estimated, and to propose alternative strategies [@Goo2021; @BorJarSpi2021; @CalSan2021; @Str2018; @Liu_Wang_Xu_2020].
In his paper, Wooldridge acknowledges that TWFE can produce weird results in such settings, but counters that
there is nothing inherently wrong with TWFE, which is an estimation method. The problem with how TWFE is implemented in DiD settings is that it is applied to a restrictive model.
He then goes on to describe a couple equivalent ways to make the model more flexible and account for heterogeneity, using Mundlak devices or an “extended” TWFE.
The extended TWFE approach is the one I focus on below. It can be very simple: Interact the treatment indicator with time and/or group-time dummies.
The rest of this notebook gives a “Proof by R” that TWFE with interactions can produce estimates of the Group-Time ATT which are very similar to those produced by the did software package by @CalSan2021.
Simulation
I begin by simulating data using a data generating process similar to (identical to?) “Simulation 6” from @BakLarWan2021. The simulated data have a few interesting features:
- 1000 firms, located in 50 states, observed every year between 1980 to 2015.
- Staggered treatment cohorts: Firms from states 1-17 are treated in 1989. Firms from states 18-34 are treated in 1998. Firms from states 35-50 are treated in 2015.
- Treatment effects are heterogeneous across treatment cohorts: Effects are strongest in the 1989 cohort and smallest in the 2007 cohort.
- Treatment effects are heterogeneous across time: Effects increase cumulatively from the year of treatment (see
cumsumin the code below).
{
dat =
treatment_groups =
dat = treatment_groups
dat
return(dat)
}
dat =
package ‘did’ was built under R version 4.5.2
package ‘broom’ was built under R version 4.5.2
package ‘ggplot2’ was built under R version 4.5.2
package ‘data.table’ was built under R version 4.5.2
Estimation
With the dataset in hand, we use the fixest package to estimate a TWFE model in which the treatment indicator is interacted with both time-to-treatment dummies and cohort dummies:
etwfe =
# Clean the results
etwfe =
etwfe = etwfe
We use the did package to apply the Callaway and Sant’Anna (2021) strategy to estimate group-time ATT:
csa =
# Clean the results
csa =
Finally, we merge the results from those two strategies and plot group-time ATT estimates across time for two cohorts:
# merge the TWFE and CSA results
results =
=
results
> Cohort Year TWFE w/ interactions CSA (2021)
>
> 1: 1989 1989 0.4766363 0.4913805
> 2: 1989 1990 0.9995402 1.0142844
> 3: 1989 1991 1.5775240 1.5922682
> 4: 1989 1992 2.0053051 2.0200494
> 5: 1989 1993 2.4780599 2.4928041
> 6: 1989 1994 3.0387271 3.0534714
> 7: 1989 1995 3.5264422 3.5411865
> 8: 1989 1996 4.0375482 4.0522924
> 9: 1989 1997 4.5915735 4.6063177
> 10: 1989 1998 5.0393995 5.0239187
> 11: 1989 1999 5.4969626 5.4814818
> 12: 1989 2000 6.0344999 6.0190191
> 13: 1989 2001 6.4645668 6.4490860
> 14: 1989 2002 6.9576426 6.9421618
> 15: 1989 2003 7.5589843 7.5435035
> 16: 1989 2004 7.9299329 7.9144521
> 17: 1989 2005 8.4867174 8.4712366
> 18: 1989 2006 8.9533914 8.9379106
> 19: 1998 1998 0.2423307 0.2464964
> 20: 1998 1999 0.5898258 0.5939915
> 21: 1998 2000 0.9730009 0.9771666
> 22: 1998 2001 1.2332469 1.2374127
> 23: 1998 2002 1.4561652 1.4603309
> 24: 1998 2003 1.8340160 1.8381818
> 25: 1998 2004 1.9912194 1.9953851
> 26: 1998 2005 2.4007741 2.4049398
> 27: 1998 2006 2.6075146 2.6116803
> Cohort Year TWFE w/ interactions CSA (2021)
>
dat_plot =
+
+
+
Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
The estimates are so similar that lines are hard to distinguish visually. But we can see how close the two sets of results really are by plotting them against each other:
+
+
+
+
That’s all I have to show you today. To reiterate, there is a bunch more stuff in the Wooldridge paper and, frankly, I haven’t digested all of it yet. Make sure you click on the links above to learn more, and get in touch with if you want to share your different interpretation, or if you feel like different points should have been emphasized.
More on the Baker simulation
Simulated data
Adapted from Baker via Chabé-Ferret:
/https://chabefer.github.io/STCI/NE.html#difference-in-differences-with-instrumental-variables
scale_colour_discrete <- scale_colour_okabe_ito
scale_fill_discrete <- scale_fill_okabe_ito
# set seed
# Fixed Effects ------------------------------------------------
# unit fixed effects
unit <-
# year fixed effects
year <-
# Trend Break -------------------------------------------------------------
# Put the states into treatment groups
treat_taus <-
# make main dataset
# full interaction of unit X year
dat <- %>%
%>%
%>%
%>%
# make error term and get treatment indicators and treatment effects
%>%
# calculate cumulative treatment effects
%>%
%>%
%>%
# calculate the dep variable
Model
Adapted from Wooldridge (2021)
mod <-
Results
cohorts <-
res <- mod %>%
# extract
%>%
# ignore year dummies
%>%
# cleanup
%>%
%>%
# ATTs are available for a cohort only until the next cohort gets treated
%>%
+
# Baker simulation:
# tau is a random variable equal with mean 0.3
# cumsum(tau) = ATT
+
+
Notes:
- The ATTs for the final cohort (2004) are not identified.
- We only get Group-Time ATTs for group j for the years before the next cohort gets treated. This explains why there are more estimates for certain cohorts in the Figure above.
Loading source...