Causal ai, which explores the integration of causal reasoning into machine learning
Welcome to my series on Causal ai, where we'll explore integrating causal reasoning into machine learning models. Expect to explore a number of practical applications in different business contexts.
In the last article we covered Optimization of the effects of non-linear treatment on prices and promotions.. This time we will cover measure the intrinsic causal influence of your marketing campaigns.
If you missed the last article on the effects of nonlinear treatment on pricing and promotions, check it out here:
In this article I will help you understand how you can measure the intrinsic causal influence of your marketing campaigns.
The following aspects will be covered:
- What are the challenges when it comes to marketing measurement?
- What is intrinsic causal influence and how does it work?
- A case study worked in Python showing how we can use intrinsic causal influence to give marketing campaigns the credit they deserve.
The complete notebook can be found here:
What are the different types of marketing campaigns?
Organizations use marketing to grow their business by acquiring new customers and retaining existing ones. Marketing campaigns are usually divided into 3 main categories:
- Brand
- Performance
- Retention
Each has its own unique challenges when it comes to measurement. Understanding these challenges is crucial.
Brand campaigns
The goal of brand campaigns is to make your brand known to new audiences. They are often broadcast on television and social networks, and the latter are usually in the format of a video. They usually don't have a direct call to action, for example, “our product will last you a lifetime.”
The challenge of measuring television is immediately obvious: we can't track who has watched a television ad! But we also have similar challenges when it comes to social media: if I watch a video on facebook and then organically visit the website and purchase the product the next day, it is highly unlikely that we will be able to link these two events.
There is also the secondary challenge of a delayed effect. When raising awareness with new audiences, it may take days, weeks, or months for them to reach the point where they consider purchasing your product.
There is an arguable argument that brand campaigns do all the hard work; However, when it comes to marketing measurement, they are often undervalued due to some of the challenges we highlighted above.
Performance campaigns
Generally, performance campaigns are aimed at customers who are in the market for your product. They are found on paid search channels, social networks and affiliates. They usually have a call to action, for example, “click now to get 5% off your first purchase.”
When it comes to performance campaigns, it's not immediately obvious why they are difficult to measure. We can most likely link the event where a customer clicks to a performance campaign and that customer purchases that day.
But would they have clicked if they weren't already familiar with the brand? How did you become familiar with the brand? If we hadn't shown them the campaign, would they have bought organically anyway? These are difficult questions to answer from a data science perspective!
Retention campaigns
The other category of campaigns is retention. This is marketing aimed at retaining existing customers. We can usually perform AB testing to measure these campaigns.
Acquisition Marketing Chart
It is common to refer to brand and performance campaigns as acquisition marketing. As I mentioned above, measuring brand and performance campaigns is challenging: we often underestimate brand campaigns and overvalue performance campaigns.
The following graphic is a motivating (but simplified) example of how acquisition marketing works:
How can we (fairly) estimate how much each node contributed to the revenue? This is where intrinsic causal influence comes into the picture. Let's dive into what it is in the next section!
Where does the concept come from?
The concept was originally proposed in a 2020 article:
It is implemented in the GCM module within the DoWhy Python package:
Personally, I found the concept quite difficult to understand at first, so in the next section we will break it down step by step.
Causal Graph Summary
Before attempting to understand intrinsic causal influence, it is important to understand causal graphs, structural causal models (SCM), and additive noise models (ANM). My previous article in the series should help you get up to speed:
As a reminder, each node in a causal graph can be viewed as the target in a model where its direct parents are used as features. It is common to use an additive noise model for each non-root node:
What really is intrinsic causal influence?
Now that we've recapped causal graphs, let's begin to understand what intrinsic causal influence really is…
The dictionary definition of intrinsic is “to belong naturally.” In my head I think of a funnel, and the things at the top of the funnel are doing the heavy lifting. We want to attribute them the causal influence they deserve.
Let's take the following example graph to help us begin to further unravel the intrinsic causal influence:
- A, B and C are root nodes.
- D is a non-root node, which we can model using its direct parents (A, B, C) and a noise term.
- E is a non-root node that, like D, we can model using its direct parents (A, B, C) and a noise term.
- F is our target node, which we can model using its direct parents (D, E) and a noise term.
Let's focus on node D. It inherits part of its influence on node F from nodes A, B and C. The intrinsic part of its influence on node F comes from the term noise. Therefore, we say that the noise term of each node can be used to estimate the intrinsic causal influence on a target node. It is worth noting that the root nodes are simply made up of noise.
In the case study, we will delve into how exactly to calculate the intrinsic causal influence.
How can it help us measure our marketing campaigns?
Hopefully, you can now see the link between the acquisition marketing example and intrinsic causal influence. Can intrinsic causal influence help us stop undervaluing brand campaigns and overvaluing performance campaigns? Let's find out in the case study!
Background
The end of the year is approaching and the Marketing Director is being pressured by the Finance team to justify why she plans to spend so much on marketing next year. The Finance team uses a last-click model where revenue is attributed to the last thing a customer clicked on. They wonder why they need to spend anything on TV when everyone accesses it through organic or social channels!
The data science team is tasked with estimating the intrinsic causal influence of each marketing channel.
Configure the graph (DAG)
We start by setting up a DAG using expert domain knowledge, reusing the marketing acquisition example from above:
# Create node lookup for channels
node_lookup = {0: 'Demand',
1: 'TV spend',
2: 'Social spend',
3: 'Organic clicks',
4: 'Social clicks',
5: 'Revenue'
}total_nodes = len(node_lookup)
# Create adjacency matrix - this is the base for our graph
graph_actual = np.zeros((total_nodes, total_nodes))
# Create graph using expert domain knowledge
graph_actual(0, 3) = 1.0 # Demand -> Organic clicks
graph_actual(0, 4) = 1.0 # Demand -> Social clicks
graph_actual(1, 3) = 1.0 # Brand spend -> Organic clicks
graph_actual(2, 3) = 1.0 # Social spend -> Organic clicks
graph_actual(1, 4) = 1.0 # Brand spend -> Social clicks
graph_actual(2, 4) = 1.0 # Social spend -> Social clicks
graph_actual(3, 5) = 1.0 # Organic clicks -> Revenue
graph_actual(4, 5) = 1.0 # Social clicks -> Revenue
At its core, the last-click model used by the finance team only uses direct revenue to measure marketing.
Data generation process
We created some data samples following the DAG data generation process:
- 3 root nodes formed by noise terms; Demand, brand spending and social spending.
- 2 non-root nodes, both inheriting the influence of the 3 root nodes plus some noise terms; Organic clicks, social clicks.
- 1 target node, which inherits the influence of the 2 non-root nodes plus a noise term; Revenue
# Create dataframe with 1 column per code
df = pd.DataFrame(columns=node_lookup.values())# Setup data generating process
df(node_lookup(0)) = np.random.normal(100000, 25000, size=(20000)) # Demand
df(node_lookup(1)) = np.random.normal(100000, 20000, size=(20000)) # Brand spend
df(node_lookup(2)) = np.random.normal(100000, 25000, size=(20000)) # Social spend
df(node_lookup(3)) = 0.75 * df(node_lookup(0)) + 0.50 * df(node_lookup(1)) + 0.25 * df(node_lookup(2)) + np.random.normal(loc=0, scale=2000, size=20000) # Organic clicks
df(node_lookup(4)) = 0.30 * df(node_lookup(0)) + 0.50 * df(node_lookup(1)) + 0.70 * df(node_lookup(2)) + np.random.normal(100000, 25000, size=(20000)) # Social clicks
df(node_lookup(5)) = df(node_lookup(3)) + df(node_lookup(4)) + np.random.normal(loc=0, scale=2000, size=20000) # Revenue
Training the SCM
Now we can train the SCM using the GCM module from the DoWhy Python package. We set up the data generation process with linear relationships, therefore we can use ridge regression as a causal mechanism for each non-root node:
# Setup graph
graph = nx.from_numpy_array(graph_actual, create_using=nx.DiGraph)
graph = nx.relabel_nodes(graph, node_lookup)# Create SCM
causal_model = gcm.InvertibleStructuralCausalModel(graph)
causal_model.set_causal_mechanism('Demand', gcm.EmpiricalDistribution()) # Deamnd
causal_model.set_causal_mechanism('TV spend', gcm.EmpiricalDistribution()) # Brand spend
causal_model.set_causal_mechanism('Social spend', gcm.EmpiricalDistribution()) # Social spend
causal_model.set_causal_mechanism('Organic clicks', gcm.AdditiveNoiseModel(gcm.ml.create_ridge_regressor())) # Organic clicks
causal_model.set_causal_mechanism('Social clicks', gcm.AdditiveNoiseModel(gcm.ml.create_ridge_regressor())) # Social clicks
causal_model.set_causal_mechanism('Revenue', gcm.AdditiveNoiseModel(gcm.ml.create_ridge_regressor())) # Revenue
gcm.fit(causal_model, df)
Intrinsic causal influence
We can easily calculate the intrinsic causal influence using the GCM module. We do it and convert the contributions to percentages:
# calculate intrinsic causal influence
ici = gcm.intrinsic_causal_influence(causal_model, target_node='Revenue')def convert_to_percentage(value_dictionary):
total_absolute_sum = np.sum((abs(v) for v in value_dictionary.values()))
return {k: round(abs(v) / total_absolute_sum * 100, 1) for k, v in value_dictionary.items()}
convert_to_percentage(ici)
Let's show them on a bar chart:
# Convert dictionary to DataFrame
df = pd.DataFrame(list(ici.items()), columns=('Node', 'Intrinsic Causal Influence'))# Create a bar plot
plt.figure(figsize=(10, 6))
sns.barplot(x='Node', y='Intrinsic Causal Influence', data=df)
# Rotate x labels for better readability
plt.xticks(rotation=45)
plt.title('Bar Plot from Dictionary Data')
plt.show()
Are our results intuitive? If you take a look at the code of the data generation process, you will see that they are. Pay close attention to what each non-root node inherits and what additional noise is added.
The intrinsic causal influence module is really easy to use, but it doesn't help us understand the method behind it. Finally, let's explore the inner workings of intrinsic causal influence!
Intrinsic causal influence: how does it work?
We want to estimate how much each node's noise term contributes to the target node:
- It is worth remembering that root nodes are simply made up of a noise term.
- In non-root nodes, we separate the noise term from what was inherited from the parents.
- We also include the noise term of the target node. This could be interpreted as the contribution of unobserved confounders (although it could also be due to model misspecification).
- Then, noise terms are used to explain the variance in the target node. This can be seen as a model with noise terms as features and the target node as output.
- The model is used to estimate the conditional distribution of the target node given subsets of noise variables.
- Shapley is then used to estimate the contribution of each noise term: if changing the noise term has little impact on the target, then the intrinsic causal influence will be very small.
Today we cover how you can estimate the intrinsic causal influence of your marketing campaigns. Here are some final thoughts:
- Intrinsic causal influence is a powerful concept that could be applied in different use cases, not just marketing.
- Understanding the inner workings will help you apply it more effectively.
- Identifying the DAG and estimating the graph accurately is key to obtaining reasonable estimates of intrinsic causal influence.
- In the marketing acquisition example, you might want to think about adding lagged effects for brand marketing.