For multi-product companies, a critical metric is often what is called “cross-product adoption.” (i.e. understanding how users interact with multiple offerings in a given product portfolio)
A suggested metric for calculating cross-product or cross-functional usage in the popular book Hack growth (1) is the Jaccard Index. Traditionally used to measure similarity between two sets, the Jaccard index can also serve as a powerful tool for evaluating product adoption patterns. To do this, by quantifying the overlap of users between products, synergies between products and growth opportunities can be identified.
a dbt package dbt_set_similarity is designed to simplify the calculation of established similarity metrics directly within an analytical workflow. This package provides a method to compute Jaccard indices within SQL transformation workloads.
To import this package into your dbt project, add the following to the packages.yml
archive. We will also need dbt_utils for the purposes of the example in this article. run a dbt deps
command inside your project to install the package.
packages:
- package: Matts52/dbt_set_similarity
version: 0.1.1
- package: dbt-labs/dbt_utils
version: 1.3.0
The Jaccard index, also known as the Jaccard similarity coefficient, is a metric used to measure the similarity between two sets. It is defined as the size of the intersection of the sets divided by the size of their union.
Mathematically it can be expressed as:
Where:
- TO and b are two sets (for example, users of product A and product B)
- The numerator represents the number of elements in both sets.
- The denominator represents the total number of distinct elements in both sets.
The Jaccard index is particularly useful in the context of cross-product adoption because:
- It focuses on the overlap between two sets, making it ideal for understanding shared user bases.
- It takes into account differences in the total size of the sets, ensuring that the results are proportional and not biased by outliers.
For example:
- If 100 users adopt Product A and 50 adopt Product B, and 25 users adopt both, the Jaccard Index is 25 / (100 + 50 – 25) = 0.2, indicating a 20% overlap between the two bases of users according to the Jaccard Index.
The example data set we will use is a fictitious SaaS company that offers storage space as a product to consumers. This company offers two different storage products: document storage (doc_storage) and photo storage (photo_storage). These can be true, indicating that the product has been adopted, or false, indicating that the product has not been adopted.
Furthermore, demographics (user_category) this company serves are technology enthusiasts or homeowners.
For the sake of this example, we will read this csv file as a “seed” model called seed_example
within the dbt project.
Now, let's say we want to calculate the jaccard index (cross-adoption) between our document and photo storage products. First, we need to create an array (list) of the users who have the document storage product, along with an array of the users who have the photo storage product. In the second cte we apply the jaccard_coef
function of the dbt_set_similarity
package to help us easily calculate the jaccard coefficient between the two user ID matrices.
with product_users as (
select
array_agg(user_id) filter (where doc_storage = true)
as doc_storage_users,
array_agg(user_id) filter (where photo_storage = true)
as photo_storage_users
from {{ ref('seed_example') }}
)select
doc_storage_users,
photo_storage_users,
{{
dbt_set_similarity.jaccard_coef(
'doc_storage_users',
'photo_storage_users'
)
}} as cross_product_jaccard_coef
from product_users
As we can interpret, It appears that just over half (60%) of users who have adopted either product have adopted both. We can graphically verify our result by placing the sets of user IDs in a Venn diagram, where we see that three users have adopted both products, out of five users in total: 3/5 = 0.6.
Using the dbt_set_similarity
package, creating segmented jaccard indices for our different user categories should be quite natural. We will follow the same pattern as before, however we will simply group our aggregations based on the user category a user belongs to.
with product_users as (
select
user_category,
array_agg(user_id) filter (where doc_storage = true)
as doc_storage_users,
array_agg(user_id) filter (where photo_storage = true)
as photo_storage_users
from {{ ref('seed_example') }}
group by user_category
)select
user_category,
doc_storage_users,
photo_storage_users,
{{
dbt_set_similarity.jaccard_coef(
'doc_storage_users',
'photo_storage_users'
)
}} as cross_product_jaccard_coef
from product_users
We can see from the data that among homeowners, cross-product adoption is higher when considering jaccard indices. As shown in the result, all owners who adopted one of the products adopted both. Meanwhile, only a third of tech enthusiasts who have adopted one product have adopted both products. Therefore, in our small data set, cross-product adoption is higher among homeowners than among technology enthusiasts.
We can graphically verify the result by creating the Venn diagram again:
dbt_set_similarity provides a simple and efficient way to calculate cross-product adoption metrics, such as the Jaccard Index, directly within a dbt workflow. By applying this method, Multi-product companies can gain valuable insights into user behavior and adoption patterns across their entire product portfolio.. In our example, we demonstrate the calculation of overall adoption across products, as well as segmented adoption for different user categories.
Using the package for cross-product adoption is just a simple application. In reality, there are countless other potential applications of this technique, for example some areas are:
- Feature Usage Analysis
- Impact analysis of marketing campaigns.
- Support analysis
Besides, This style of analysis is certainly not limited to just SaaS.but it can be applied to virtually any industry. Happy Jaccard-ing!
References
(1) Sean Ellis and Morgan Brown, <a target="_blank" class="af rp" href="https://www.amazon.ca/Hacking-Growth-Fastest-Growing-Companies-Breakout/dp/045149721X” rel=”noopener ugc nofollow” target=”_blank”>Hack growth (2017)