The causal analysis process is used to determine and address the causes and effects of a problem. Instead of addressing the symptoms of a problem, causal analysis helps identify the root cause of the problem so that its symptoms are less impactful. To better understand this with the help of an example, consider the scenario where airline tickets are becoming prohibitively expensive. The first stage is to determine what causes airfare fluctuations in order to find a possible macroeconomic measure to reduce airfares. A key variable that significantly affects airfares is the price of crude oil. If oil prices go up, airfares will increase proportionately to accommodate an increase in the cost of fuel for airlines. On the other hand, if airlines increase their fares without taking into account any changes in oil prices, this increase should not affect oil prices. As a result, it is safe to conclude that oil prices influence airfares, but not the other way around.
This example shows how to perform an intervention on one variable and forecast its impact on another using causal analysis. Using only historical data, causal analysis can help researchers automatically predict such cause-effect relationships. Additionally, causal analysis is useful for determining a numerical estimate of the change in value of a characteristic if its causal predecessors are affected. Although the crude oil and airfare example was reasonably straightforward, causal analysis can be a difficult task in a multivariate system.
To make it easier for researchers to perform causal analysis, Salesforce researchers recently introduced the CausalAI Library, an open source library for causal analysis using observational data. The library provides algorithms that can handle linear and nonlinear causal interactions between variables and supports tabular and time series data of different data types (discrete and continuous). Salesforce’s CausalAI library is intended to offer a comprehensive solution for the many additional requirements in causal analysis, ranging from data generation to multi-processing for speed. In addition, the researchers provide a coding-free user interface that allows users to perform causal analyses. The main goal of the library is to offer a quick and easy-to-use solution to various problems related to causation.
The Salesforce CausalAI library is intended to address problems of discovery and causal inference. Using observational data, causal discovery aims to answer problems such as which variable in a multivariable system affects which variable. To put it another way, the goal of causal discovery is to discover the directed causal graph underlying the observational data, where the variables are considered nodes and the edges remain unknown. On the other hand, causal inference involves calculating a numerical estimate of how one set of variables influences another variable. Unlike inference in machine learning models, which is based on correlation, causal inference traverses the causal graph to determine how changes in one variable affect the target variable. This indicates that although two or more variables are correlated, it is possible that there is no causal link between them, in which case changing one of them may not have an impact on the other.
The library’s causal discovery module generates an output causal graph from an input consisting of an observation data object and an optional prior knowledge object. The causal inference module receives a causal graph as input that can be provided directly by the user or estimated by the causal discovery module, along with user-defined interventions, and generates the estimated effect on a target variable.
Apart from certain key features such as supporting data of different data types, using structural equation modeling to generate synthetic data, and distributed computing, the library also has many other features. Supporting specific causal discovery is one of them. In this case, the user is only interested in knowing the causes of a single variable of interest and not the causes of the entire causal graph. Users can also incorporate any user-provided partial background knowledge and view tabular and time-series causal graphs. When it comes to the supported algorithms for causal discovery, the PC algorithm, Granger causation, and VARLINGAM algorithms are supported for time series data and the PC algorithm is supported for tabular data. To mimic the data generation process for causal inference, conditional models based on the causal graph are learned.
Due to its parallelization functionality and user-friendly interface, the CausalAI library outperforms other libraries for causal analysis. The Salesforce team is constantly developing the library. In their future work, the researchers aim to expand the library of algorithms for causal discovery and inference. Other goals include supporting latent variables, GPU-based computing, and heterogeneous data types (continuous and discrete mixed types). More details about the Salesforce CausalAI library can be found below.
review the Github and Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 13k+ ML SubReddit, discord channel, and electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.
Khushboo Gupta is a consulting intern at MarktechPost. He is currently pursuing his B.Tech at the Indian Institute of Technology (IIT), Goa. She is passionate about the fields of machine learning, natural language processing, and web development. She likes to learn more about the technical field by participating in various challenges.