In this article, I explore the public transportation systems of four selected cities based on the General Transportation Power Specification and various spatial data science tools.
I selected four cities in this notebook, Budapest, Berlin, Stockholm, and Toronto, to review their public transportation system using publicly available GTFS (General Transit Feed Specification) data. This workbook is intended to serve as an introductory tutorial on how to access, manipulate, aggregate, and visualize public transportation data using Pandas, GeoPandas, and other standard data science tools to obtain information about public transportation. Later, such understanding can be useful in various use cases, such as transportation, urban planning, and location intelligence.
Additionally, while the GTFS format is supposed to be general and universal, I will also point out situations that still require one-by-one information at the city level and manual validations throughout the following analytical steps.
For this article, I downloaded public transportation data from Transitfeeds.com, a website that aggregates public transportation data online. In particular, I downloaded data with the following most recent update times for the following cities:
In the following blocks of code, I will explore each of these cities several times, create comparative graphs, and emphasize the universality of the GTFS format. Additionally, to ensure that my analyzes are easy to update with more recent data dumps, I store each city's GTFS data in a folder corresponding to the update date:
import osroot = 'data'
cities = ('Budapest', 'Toronto', 'Berlin', 'Stockholm')
updated = {city : (f for f in os.listdir(root + '/' + city) if '20' in f)(0) for city in cities}
updated
The output of this cell:
Now, let's take a closer look at the different files stored in these folders:
for city in cities…