In this article, I combine previous work on urban accessibility or walkability with open source data on the location of public defibrillator devices. Additionally, I incorporate global population data and Uber’s H3 network system to estimate the proportion of the population within reasonable reach of any device within Budapest and Vienna.
The root of urban accessibility, or passability, lies in a graph-based calculation that measures Euclidean distance (transforming it into minutes of walking, assuming a constant speed and without traffic jams or obstacles). The results of such analyzes can tell us how easy it is to reach specific types of services from each location in the city. To be more precise, from each node within the city’s road network, but due to the large number of road junctions, this approximation is practically negligible.
In this current case study, I focus on a particular type of point of interest (POI): the location of defibrillator devices. While the Austrian government’s Open Data Portal shares official records on this, in Hungary I was only able to obtain a crowd-sourced dataset of less than half coverage, which will hopefully grow later in both absolute size and coverage of data.
In the first section of my article I will create the accessibility map of each city, visualizing the time needed to reach the closest defibrillator units within a 2.5 km radius at a running speed of 15 km/h. I will then divide the cities into hexagonal grids using Uber’s H3 library to calculate the average defibrillator accessibility time for each grid cell. I also estimate the population level in each hexagonal cell after my previous article. Finally, I combine them and calculate the fraction of the accessible population as a function of the accessibility time (running).
As a disclaimer, I want to emphasize that I am by no means a trained medical expert and do not intend to take a stance on the importance of defibrillator devices compared to other means of defibrillator. life support. However, based on common sense and urban planning principles, I assume that the easier it is to get to such devices, the better.
As always, I like to start by exploring the types of data I use. First, I will compile the administrative boundaries of the cities I study in: Budapest, Hungary and Vienna, Austria.
Then, building on a previous article of mine on how to process raster population data, I add city-level population information from the WorldPop hub. Finally, I incorporate official government data on defibrillator devices in Vienna and my own web-sourced version of the same, albeit crowded and inherently incomplete sources, for Budapest.
1.1. Administrative limits
First, I query the administrative boundaries of Budapest and Vienna from Open street map using the OSMNx library:
import osmnx as ox # version: 1.0.1
import matplotlib.pyplot as plt # version: 3.7.1admin = {}
cities = ('Budapest', 'Vienna')
f, ax = plt.subplots(1,2, figsize = (15,5))
# visualize the admin boundaries
for idx, city in enumerate(cities):
admin(city) = ox.geocode_to_gdf(city)
admin(city).plot(ax=ax(idx),color='none',edgecolor= 'k', linewidth = 2) ax(idx).set_title(city, fontsize = 16)
The result of this code block:
1.2. Population data
Second, following the steps in this article, I created the population grid in vector data format for both cities, based on the WorldPop online demographic database. Without repeating the steps, I simply read the output files of that process that contain population information for these cities.
Also, to make things look nice, I created a color map of the color of 2022, Very Peri, using Matplotlib and a quick ChatGPT script.
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormapvery_peri = '#8C6BF3'
second_color = '#6BAB55'
colors = (second_color, very_peri )
n_bins = 100
cmap_name = "VeryPeri"
colormap = LinearSegmentedColormap.from_list(cmap_name, colors, N=n_bins)
import geopandas as gpd # version: 0.9.0demographics = {}
f, ax = plt.subplots(1,2, figsize = (15,5))
for idx, city in enumerate(cities):
demographics(city) = gpd.read_file(city.lower() + \
'_population_grid.geojson')(('population', 'geometry'))
admin(city).plot(ax=ax(idx), color = 'none', edgecolor = 'k', \
linewidth = 3)
demographics(city).plot(column = 'population', cmap = colormap, \
ax=ax(idx), alpha = 0.9, markersize = 0.25)
ax(idx).set_title(city)
ax(idx).set_title('Population density\n in ' + city, fontsize = 16)
ax(idx).axis('off')
The result of this code block:
1.3. Defibrillator locations
Third, I collected location data on available defibrillators in both cities.
For Vienna, I downloaded this dataset from Official open data portal of the Austrian government. containing the location of the 1044 unit point:
Although there is no official open data portal in Budapest/Hungary, the Hungarian National Heart Foundation manages a collaborative website where operators can update the location of their defibrillator units. Its nationwide database consists of 677 units; However, their disclaimer says they know of at least a thousand units operating in the country and are waiting for their owners to charge them. Using a simple web crawler, I downloaded the location of each of the 677 registered units and filtered the set data down to those in Budapest, resulting in a set of 148 units.
# parse the data for each city
gdf_units= {}gdf_units('Vienna') = gpd.read_file('DEFIBRILLATOROGD')
gdf_units('Budapest') = gpd.read_file('budapest_defibrillator.geojson')
for city in cities:
gdf_units(city) = gpd.overlay(gdf_units(city), admin(city))
# visualize the units
f, ax = plt.subplots(1,2, figsize = (15,5))
for idx, city in enumerate(cities):
admin(city).plot(ax=ax(idx),color='none',edgecolor= 'k', linewidth = 3)
gdf_units(city).plot( ax=ax(idx), alpha = 0.9, color = very_peri, \
markersize = 6.0)
ax(idx).set_title('Locations of defibrillator\ndevices in ' + city, \
fontsize = 16)
ax(idx).axis('off')
The result of this code block:
Below I concluded this fantastic article written by Nick Jones in 2018 on how to calculate walkability:
import os
import pandana # version: 0.6
import pandas as pd # version: 1.4.2
import numpy as np # version: 1.22.4
from shapely.geometry import Point # version: 1.7.1
from pandana.loaders import osmdef get_city_accessibility(admin, POIs):
# walkability parameters
walkingspeed_kmh = 15
walkingspeed_mm = walkingspeed_kmh * 1000 / 60
distance = 2500
# bounding box as a list of llcrnrlat, llcrnrlng, urcrnrlat, urcrnrlng
minx, miny, maxx, maxy = admin.bounds.T(0).to_list()
bbox = (miny, minx, maxy, maxx)
# setting the input params, going for the nearest POI
num_pois = 1
num_categories = 1
bbox_string = '_'.join((str(x) for x in bbox))
net_filename = 'data/network_{}.h5'.format(bbox_string)
if not os.path.exists('data'): os.makedirs('data')
# precomputing nework distances
if os.path.isfile(net_filename):
# if a street network file already exists, just load the dataset from that
network = pandana.network.Network.from_hdf5(net_filename)
method = 'loaded from HDF5'
else:
# otherwise, query the OSM API for the street network within the specified bounding box
network = osm.pdna_network_from_bbox(bbox(0), bbox(1), bbox(2), bbox(3))
method = 'downloaded from OSM'
# identify nodes that are connected to fewer than some threshold of other nodes within a given distance
lcn = network.low_connectivity_nodes(impedance=1000, count=10, imp_name='distance')
network.save_hdf5(net_filename, rm_nodes=lcn) #remove low-connectivity nodes and save to h5
# precomputes the range queries (the reachable nodes within this maximum distance)
# so, as long as you use a smaller distance, cached results will be used
network.precompute(distance + 1)
# compute accessibilities on POIs
pois = POIs.copy()
pois('lon') = pois.geometry.apply(lambda g: g.x)
pois('lat') = pois.geometry.apply(lambda g: g.y)
pois = pois.drop(columns = ('geometry'))
network.init_pois(num_categories=num_categories, max_dist=distance, max_pois=num_pois)
network.set_pois(category='all', x_col=pois('lon'), y_col=pois('lat'))
# searches for the n nearest amenities (of all types) to each node in the network
all_access = network.nearest_pois(distance=distance, category='all', num_pois=num_pois)
# transform the results into a geodataframe
nodes = network.nodes_df
nodes_acc = nodes.merge(all_access((1)), left_index = True, right_index = True).rename(columns = {1 : 'distance'})
nodes_acc('time') = nodes_acc.distance / walkingspeed_mm
xs = list(nodes_acc.x)
ys = list(nodes_acc.y)
nodes_acc('geometry') = (Point(xs(i), ys(i)) for i in range(len(xs)))
nodes_acc = gpd.GeoDataFrame(nodes_acc)
nodes_acc = gpd.overlay(nodes_acc, admin)
nodes_acc(('time', 'geometry')).to_file(city + '_accessibility.geojson', driver = 'GeoJSON')
return nodes_acc(('time', 'geometry'))
accessibilities = {}
for city in cities:
accessibilities(city) = get_city_accessibility(admin(city), gdf_units(city))
for city in cities:
print('Number of road network nodes in ' + \
city + ': ' + str(len(accessibilities(city))))
This code block generates the number of road network nodes in Budapest (116,056) and in Vienna (148,212).
Now view the accessibility maps:
for city in cities:
f, ax = plt.subplots(1,1,figsize=(15,8))
admin(city).plot(ax=ax, color = 'k', edgecolor = 'k', linewidth = 3)
accessibilities(city).plot(column = 'time', cmap = 'RdYlGn_r', \
legend = True, ax = ax, markersize = 2, alpha = 0.5)
ax.set_title('Defibrillator accessibility in minutes\n' + city, \
pad = 40, fontsize = 24)
ax.axis('off')
This code block generates the following figures:
At this point, I have both population and accessibility data; I just have to put them together. The only trick is that their spatial units differ:
- Accessibility is measured and attached to each node within each city’s road network.
- The population data is derived from a raster grid, now described by the centroid point of interest of each raster grid.
While reinstating the original raster grid may be an option, in hopes of achieving more pronounced universality (and adding a bit of my personal taste), I now map these two types of point data sets into the Uber H3 Network System For those who haven’t used it before, for now it’s enough to know that this is an elegant and efficient spatial indexing system that uses hexagonal tiles. And to read more, click this link!
3.1. Creating H3 cells
First, create a function that divides a city into hexagons at any given resolution:
import geopandas as gpd
import h3 # version: 3.7.3
from shapely.geometry import Polygon # version: 1.7.1
import numpy as npdef split_admin_boundary_to_hexagons(admin_gdf, resolution):
coords = list(admin_gdf.geometry.to_list()(0).exterior.coords)
admin_geojson = {"type": "Polygon", "coordinates": (coords)}
hexagons = h3.polyfill(admin_geojson, resolution, \
geo_json_conformant=True)
hexagon_geometries = {hex_id : Polygon(h3.h3_to_geo_boundary(hex_id, \
geo_json=True)) for hex_id in hexagons}
return gpd.GeoDataFrame(hexagon_geometries.items(), columns = ('hex_id', 'geometry'))
resolution = 8
hexagons_gdf = split_admin_boundary_to_hexagons(admin(city), resolution)
hexagons_gdf.plot()
The result of this code block:
Now, look at some different resolutions:
for resolution in (7,8,9):admin_h3 = {}
for city in cities:
admin_h3(city) = split_admin_boundary_to_hexagons(admin(city), resolution)
f, ax = plt.subplots(1,2, figsize = (15,5))
for idx, city in enumerate(cities):
admin(city).plot(ax=ax(idx), color = 'none', edgecolor = 'k', \
linewidth = 3)
admin_h3(city).plot( ax=ax(idx), alpha = 0.8, edgecolor = 'k', \
color = 'none')
ax(idx).set_title(city + ' (resolution = '+str(resolution)+')', \
fontsize = 14)
ax(idx).axis('off')
The result of this code block:
Let’s keep resolution 9!
3.2. Assign values to h3 cells
I now have our two cities in a hexagonal grid format. Next, I’ll assign the population and accessibility data to the hexagonal cells based on which grid cells each point geometry is located in. For this, GeoPandasa’s join function, which performs good spatial join, is a good choice.
Additionally, since we have over 100,000 road network nodes in each city and thousands of population grid centroids, there will most likely be multiple POIs assigned in each hexagonal grid cell. Therefore, aggregation will be necessary. Since population is an additive quantity, I will add population levels within the same hexagon by summarizing them. However, the accessibility is not extensive, so I would calculate the average defibrillator accessibility time for each tile.
demographics_h3 = {}
accessibility_h3 = {}for city in cities:
# do the spatial join, aggregate on the population level of each \
# hexagon, and then map these population values to the grid ids
demographics_dict = gpd.sjoin(admin_h3(city), demographics(city)).groupby(by = 'hex_id').sum('population').to_dict()('population')
demographics_h3(city) = admin_h3(city).copy()
demographics_h3(city)('population') = demographics_h3(city).hex_id.map(demographics_dict)
# do the spatial join, aggregate on the population level by averaging
# accessiblity times within each hexagon, and then map these time score # to the grid ids
accessibility_dict = gpd.sjoin(admin_h3(city), accessibilities(city)).groupby(by = 'hex_id').mean('time').to_dict()('time')
accessibility_h3(city) = admin_h3(city).copy()
accessibility_h3(city)('time') = \
accessibility_h3(city).hex_id.map(accessibility_dict)
# now show the results
f, ax = plt.subplots(2,1,figsize = (15,15))
demographics_h3(city).plot(column = 'population', legend = True, \
cmap = colormap, ax=ax(0), alpha = 0.9, markersize = 0.25)
accessibility_h3(city).plot(column = 'time', cmap = 'RdYlGn_r', \
legend = True, ax = ax(1))
ax(0).set_title('Population level\n in ' + city, fontsize = 16)
ax(1).set_title('Defibrillator reachability time\n in ' + city, \
fontsize = 16)
for ax_i in ax: ax_i.axis('off')
The results of this code block are the following figures:
In this last step, I will estimate the fraction of the population accessible from the nearest defibrillator unit within a certain period of time. Here I continue to take advantage of the relatively fast race pace of 15 km/h and the distance limit of 2.5 km.
From the technical perspective, I merge the H3 level population and accessibility time data frames and then do a simple threshold on the time dimension and a sum on the population dimension.
f, ax = plt.subplots(1,2, figsize = (15,5))for idx, city in enumerate(cities):
total_pop = demographics_h3(city).population.sum()
merged = demographics_h3(city).merge(accessibility_h3(city).drop(columns =\
('geometry')), left_on = 'hex_id', right_on = 'hex_id')
time_thresholds = range(10)
population_reached = (100*merged(merged.time<limit).population.sum()/total_pop for limit in time_thresholds)
ax(idx).plot(time_thresholds, population_reached, linewidth = 3, \
color = very_peri)
ax(idx).set_xlabel('Reachability time (min)', fontsize = 14, \
labelpad = 12)
ax(idx).set_ylabel('Fraction of population reached (%)', fontsize = 14, labelpad = 12)
ax(idx).set_xlim((0,10))
ax(idx).set_ylim((0,100))
ax(idx).set_title('Fraction of population vs defibrillator\naccessibility in ' + city, pad = 20, fontsize = 16)
The result of this code block is the following figures:
In interpreting these results, I would like to emphasize that, on the one hand, defibrillator accessibility may not be directly related to heart attack survival rate; Judging that effect is beyond my experience and the scope of this project. Furthermore, the data used for Budapest are consciously incomplete and cluttered sources, unlike the official Austrian data source.
After the disclaimers, what do we see? On the one hand, we see that in Budapest around 75-80% of the population can access a device in 10 minutes, while in Vienna we reach almost complete coverage in about 6-7 minutes. Furthermore, we must read these time values carefully: if we encounter an unfortunate incident, we must reach the device, pick it up, return (making the travel time double the accessibility time), install it, etc. in a situation where every minute could be a matter of life or death.
So from a development perspective, the takeaways are to ensure that we have complete data and then use the accessibility and population maps, combine them, analyze them and leverage them when implementing new devices and new locations to maximize the effective population reached. .