OPINION
Much of contemporary data science answers the question “What is happening?” At my company, for example, we often try to detect how well a company is performing and how one performance indicator is linked to another through correlations.
A more powerful question worth answering would be “Why is this happening?” For example, if we detect a significant correlation between the presence of women in management and the income of a company, what is the cause and what is the effect? Or, if people undergo a training program, will this improve their performance? Or would the top performers want to undergo a training program and therefore we only see an effect due to selection bias?
There are several approaches to identify causal relationships in data science. Propensity score matching (PSM) is one of the oldest and emerged about 40 years ago. Other methods like Structural equation modeling emerged at the same time. Approaches like Instrumental variables It emerged several decades earlier. Causal statistics remains a very active field and many new methods are being developed.
A key advantage of PSM is that it allows researchers to work with real-world data. In particular,…