Our research question is: what is the effect of treatment D on outcome y?? DiD allows us to estimate what would have happened to the treatment group if the intervention had not occurred. This counterfactual scenario is essential to understand the true effect of the treatment. Each job or assignment revolves around answering similar questions, such as the effect of interventions, policy changes, or treatments in various fields. In economics, it evaluates the impact of tax cuts on economic growth, while in public policy it evaluates the effects of new traffic laws on accident rates. In marketing, DiD analyzes the influence of advertising campaigns on sales.
For example, in the diagram above, we have population data in our sample. We will divide the data into treatment and control where the treatment received the intervention. We can observe post and pre variables for both groups.
Simple treatment/control difference estimator
This equation will calculate the treatment effect by comparing the changes in the outcome over time between the treatment and control groups.
I've created a mock example to help understand the math.
He DiD coefficient I would be 9 using the formula mentioned above.
DiD estimator: calculation using regression
DiD helps control for time-invariant characteristics that could bias the estimate of treatment effects. This means that it eliminates the influence of variables that are constant over time (e.g. geographic location, gender, ethnicity, innate ability, etc.). It can do so because these characteristics affect both the pre- and post-treatment periods equally for each group.
The core equation for a basic DiD model is:
where:
- y is the outcome variable for individual 𝑖 in the group j at time 𝑡.
- 𝐴𝑓𝑡𝑒𝑟 is a dummy variable equal to 1 if the observation is in the post-treatment period.
- 𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 is a dummy variable equal to 1 if the observation belongs to the treatment group.
- 𝐴𝑓𝑡𝑒𝑟 × 𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 is the interaction term, with the coefficient b capturing the DiD estimate.
The coefficient of the interaction term is the DiD estimator in y. Regression is most popular among researchers because it helps generate standard errors and control for additional variables.
This is one of the key assumptions in DiD. It is based on the idea that, in the absence of treatment, the difference between the treatment and control groups would remain constant over time. In other words, in the absence of treatment, β (DiD estimate) = 0.
Formally, this means:
Another way to think about this is that the difference between the two groups would have remained the same over time without the policy change. If trends are not parallel before treatment, DiD estimates may be biased.
How to check this assumption
Now the next question is: how to check it? The validity of the parallel trends assumption can be assessed through graphical analysis and placebo tests.
It is assumed that, in the absence of treatment, the treatment group (orange line) and the control group (blue dashed line) would follow parallel paths over time. The intervention (vertical line) marks the point at which the treatment is applied, allowing differences in trends between the two groups before and after the intervention to be compared to estimate the treatment effect.
Examples that violate the assumption of parallel trends
In simple words we look for two things in the treatment, which are the following:
- Slope change
In the two previous cases, the parallel trend assumption is not met. The treatment group's result is growing faster (part a) or more slowly (part b) than the control group's result. The mathematical way of saying this is:
DiD = true effect + differential trend (differential trend must be 0)
The differential trend could be positive (part a) or negative (part b)
DiD will not be able to isolate the impact of the intervention (true effect) since we also have a differential trend in it.
2. Jump on the treatment line (either up or down) after the intervention.
In the image above, the treatment group's trend changed differently than the control group's trend, which should have remained constant without the intervention. A jump is not allowed in the DiD studio.
Placebo tests are used to check whether the observed treatment effects are actually due to the treatment and not other confounding factors. They involve applying the same analysis to a period or group where no treatment effect is expected. If a significant effect is found in these placebo tests, it suggests that the original results may be false.
For example, in 2019 an intervention study was conducted on providing tablets to secondary schools. We can do a placebo test, which means we can create a fake intervention year, say 2017, in which we know that no policy change occurred. If applying the treatment effect analysis to the placebo date (2017) does not show any significant change, it will suggest that the effect observed in 2019 (if any) is likely due to the actual policy intervention.
- Did event study: Estimates treatment effects by specific year, which is useful for assessing the timing of treatment effects and checking previous trends. The model allows the treatment effect to vary by year. We can study the effect in the moment. t+1, t+2,…, t+north
- Synthetic control method (SCM): SCM constructs a synthetic control group by weighting multiple untreated units to create a composite that approximates the characteristics of the treated unit before the intervention. This method is particularly useful when a single treated unit is compared to a set of untreated units. Provides a more credible counterfactual by combining information from multiple units.
There are many more, but I will limit myself to just two. Maybe I'll write a post later explaining everything else in detail.
In this post, I discussed the difference-in-differences (DiD) estimator, a popular method for estimating average treatment effects. DiD is widely used to study policy effects by comparing changes over time between treatment and control groups. The key advantage of DiD is its ability to control for unobserved confounders that remain constant over time, thus isolating the true impact of an intervention.
We also explore key concepts such as the assumption of parallel trends, the importance of pretreatment data, and how to check for violations of assumptions using graphical analysis and placebo testing. Additionally, I discussed DiD extensions and variations, such as DiD Event Study and Synthetic Control Method, which offer more insight and robustness in different scenarios.
(1) Wing, C., Simon, K. and Bello-Gómez, RA (2018). Designing difference-in-difference studies: Best practices for public health policy research. Annual Public Health Review, 39453–469.
(2) Callaway, B. and Sant'Anna, P.H. (2021). Differences in differences with multiple time periods. Econometrics Magazine, 225(2), 200–230.
(3) Donald, S.G. and Lang, K. (2007). Inference with differences in differences and other panel data. The magazine of Economics and Statistics, 89(2), 221–233.
Thank you for reading!
Thank you for reading! If you liked this post and want to see more, consider following me. You can also follow me on LinkedIn. I plan to blog about causal inference and data analysis, always with the goal of keeping things simple.
A small disclaimer: I write to learn, so errors may occur despite my best efforts. If you spot any errors, please let me know. I also welcome suggestions for new topics!