Author's image | Midjourney
Time-based data can be unique when we are dealing with different time zones. However, interpreting timestamps can be difficult due to these differences. This guide will help you manage time zones and timestamps using the Pandas library in Python.
Preparation
In this tutorial, we will be using the Pandas package. We can install the package using the following code.
Now, we will explore how to work with time-based data in Pandas with practical examples.
Handling Timezones and Timestamps with Pandas
Time data is a unique data set that provides a specific time reference for events. The most accurate time data is timestamps, which contain detailed time information from year to millisecond.
Let's start by creating a sample dataset.
import pandas as pd
data = {
'transaction_id': (1, 2, 3),
'timestamp': ('2023-06-15 12:00:05', '2024-04-15 15:20:02', '2024-06-15 21:17:43'),
'amount': (100, 200, 150)
}
df = pd.DataFrame(data)
df('timestamp') = pd.to_datetime(df('timestamp'))
The “timestamp” column in the above example contains time data with second-level precision. To convert this column to a datetime format, we need to use the pd.to_datetime
function.”
We can then make date and time data timezone-aware. For example, we can convert the data to Coordinated Universal Time (UTC).
df('timestamp_utc') = df('timestamp').dt.tz_localize('UTC')
print(df)
Output>>
transaction_id timestamp amount timestamp_utc
0 1 2023-06-15 12:00:05 100 2023-06-15 12:00:05+00:00
1 2 2024-04-15 15:20:02 200 2024-04-15 15:20:02+00:00
2 3 2024-06-15 21:17:43 150 2024-06-15 21:17:43+00:00
The 'timestamp_utc' values contain a lot of information, including the time zone. We can convert the existing time zone to another one. For example, I used the UTC column and changed it to the Japan time zone.
df('timestamp_japan') = df('timestamp_utc').dt.tz_convert('Asia/Tokyo')
print(df)
Output>>>
transaction_id timestamp amount timestamp_utc \
0 1 2023-06-15 12:00:05 100 2023-06-15 12:00:05+00:00
1 2 2024-04-15 15:20:02 200 2024-04-15 15:20:02+00:00
2 3 2024-06-15 21:17:43 150 2024-06-15 21:17:43+00:00
timestamp_japan
0 2023-06-15 21:00:05+09:00
1 2024-04-16 00:20:02+09:00
2 2024-06-16 06:17:43+09:00
With this new time zone, we could filter the data based on a particular time zone. For example, we can filter the data based on Japan time.
start_time_japan = pd.Timestamp('2024-06-15 06:00:00', tz='Asia/Tokyo')
end_time_japan = pd.Timestamp('2024-06-16 07:59:59', tz='Asia/Tokyo')
filtered_df = df((df('timestamp_japan') >= start_time_japan) & (df('timestamp_japan') <= end_time_japan))
print(filtered_df)
Output>>>
transaction_id timestamp amount timestamp_utc \
2 3 2024-06-15 21:17:43 150 2024-06-15 21:17:43+00:00
timestamp_japan
2 2024-06-16 06:17:43+09:00
Working with time series data would allow us to perform time series resampling. Let's look at an example of hourly data resampling for each column in our dataset.
resampled_df = df.set_index('timestamp_japan').resample('H').count()
Take advantage of Pandas' time zone data and timestamps to get the most out of its features.
Additional Resources
Cornellius Yudha Wijaya Cornellius is a Data Science Assistant Manager and Data Writer. While working full-time at Allianz Indonesia, he loves sharing Python and data tips through social media and writing. Cornellius writes on a variety of ai and machine learning topics.