Data Science
It is a simple task — when you use built-in methods in pandas.
In Python Pandas, a DataFrame is the simplest data structure where you can store the data in tabular i.e. row — column form, and work on it to get useful insights.
While working on real-world scenarios, one of the common tasks of data analysts is to see what has changed in the data. And you can do that by comparing two sets of data.
Recently, I developed an automated computer vision system which collects data from 10 devices at two different times and stores it in 2 pandas DataFrames. To understand what has changed in the system, I compared the two DataFrames and that’s where this story’s inspiration comes from.
You can find such DataFrame comparison applications most commonly in data validation, data change detection, testing, and debugging. So, it is important to know how you can compare two datasets quickly and easily.
Therefore, in this article, I’m going to explain the three best, easiest, most reliable, and quickest ways to compare two DataFrames in pandas. You can get a quick overview of the story in the following index.
· Compare Pandas DataFrames using equals()
· Compare Pandas DataFrames using concat()
· Compare Pandas DataFrames using compare()
Let’s get started!
Before starting with the three ways to compare two DataFrames, let’s create two DataFrames with minor differences in them.
import pandas as pddf = pd.DataFrame({"device_id": ['D475', 'D175', 'D200', 'D375', 'M475', 'M400', 'M250', 'A150'],
"device_temperature": [35.4, 45.2, 59.3, 49.3, 32.2, 35.7, 36.8, 34.9],
"device_status": ["Inactive", "Active", "Active", "Active", "Active", "Inactive", "Active", "Active"]})
df1 = pd.DataFrame({"device_id": ['D475', 'D175', 'D200', 'D375', 'M475', 'M400', 'M250', 'A150'],
"device_temperature": [39.4, 45.2, 29.3, 49.3, 32.2, 35.7, 36.8, 24.9]…