Image by Editor | Midjourney and Canva
Let's learn how to use MultiIndex in Pandas for hierarchical data.
Preparation
We will need the Pandas package to make sure it is installed. You can install it using the following code:
Next, let's learn how to handle MultiIndex data in Pandas.
Using MultiIndex in Pandas
In Pandas, MultiIndex refers to indexing multiple levels in the DataFrame or Series. The process is useful if we are working with higher dimensional data in a 2D tabular structure. Using MultiIndex, we can index data with multiple keys and organize it better. Let us use an example dataset to understand it better.
import pandas as pd
index = pd.MultiIndex.from_tuples(
(('A', 1), ('A', 2), ('B', 1), ('B', 2)),
names=('Category', 'Number')
)
df = pd.DataFrame({
'Value': (10, 20, 30, 40)
}, index=index)
print(df)
The exit:
Value
Category Number
A 1 10
2 20
B 1 30
2 40
As you can see, the above DataFrame has a two-level index with category and number as the index.
It is also possible to configure the MultiIndex with the existing columns in our DataFrame.
data = {
'Category': ('A', 'A', 'B', 'B'),
'Number': (1, 2, 1, 2),
'Value': (10, 20, 30, 40)
}
df = pd.DataFrame(data)
df.set_index(('Category', 'Number'), inplace=True)
print(df)
The exit:
Value
Category Number
A 1 10
2 20
B 1 30
2 40
Even with different methods, we get similar results. This is how we can have the MultiIndex in our DataFrame.
If you already have the MultiIndex DataFrame, it is possible to exchange the level with the following code.
The exit:
Value
Number Category
1 A 10
2 A 20
1 B 30
2 B 40
Of course, we can return the MultiIndex to the columns with the following code:
The exit:
Category Number Value
0 A 1 10
1 A 2 20
2 B 1 30
3 B 2 40
So how to access MultiIndex data in Pandas DataFrame? We can use the .loc
Method for this. For example, we access the first level of the MultiIndex DataFrame.
The exit:
We can also access the data value with Tuple.
The exit:
Value 10
Name: (A, 1), dtype: int64
Finally, we can perform statistical aggregation with MultiIndex using the .groupby
method.
print(df.groupby(level=('Category')).sum())
The exit:
Mastering MultiIndex in Pandas will allow you to gain insights into hierarchical data.
Additional Resources
Cornellius Yudha Wijaya Cornellius is a Data Science Assistant Manager and Data Writer. While working full-time at Allianz Indonesia, he loves sharing Python and data tips through social media and writing. Cornellius writes on a variety of ai and machine learning topics.