Python offers a wide range of libraries that allow us to easily and quickly address problems in various research areas. Geospatial data analysis and graph theory are two research areas where Python provides a powerful set of useful libraries.. In this article, We will perform a simple analysis of world borders., specifically exploring which countries share borders with others. We’ll start by using information from a GeoJSON file that contains polygons for all the countries in the world. The ultimate goal is to create a graph representing the various limits using NetworkX and use this graph to perform multiple analyses.
GeoJSON files allow representation of various geographic areas and are widely used in geographic analysis and visualizations.. The initial stage of our analysis involves reading the countries.geojson
file and convert it into a GeoDataFrame
wearing GeoPandas
. This file was obtained from the following GitHub repository and contains polygons representing different countries around the world.
As shown above, the GeoDataFrame
contains the following columns:
ADMIN
– Represents the administrative name of the geographic area, such as the name of the country or region.ISO_A3
: Represents the ISO 3166–1 alpha-3 country code, a three-letter code that uniquely identifies countries.ISO_A2
: Indicates the ISO 3166–1 alpha-2 country code, a two-letter code also used for country identification.geometry
: This column contains the geometric information that defines the shape of the geographic area, represented asMULTIPOLYGON
data.
You can view all the multiple polygons that make up the GeoDataFrame
using theplot
method, as demonstrated below.
The multiple polygons within the geometry
the column belongs to the class shapely.geometry.multipolygon.MultiPolygon
. These objects contain several attributes, one of which is the centroid
attribute. He centroid
The attribute provides the geometric center of the MULTIPOLYGON
and returns a POINT
that this center represents.
Later we can use this POINT
to extract the latitude and longitude of each MULTIPOLYGON
and store the results in two columns within the GeoDataFrame
. We perform this calculation because we will later use these latitude and longitude values to display the nodes in the graph based on their actual geographic positions.
Now is the time to proceed the construction of the graph that will represent the borders between different countries in the world. In this graph, he the nodes will represent countrieswhile The edges will indicate the existence of a border between these countries. If there is an edge between two nodes, the graph will have an edge connecting them; otherwise there will be no advantage.
The function create_country_network
processes information within the GeoDataFrame
and build a Graph
Representing the borders of the country.
Initially, the function iterates through each row of the GeoDataFrame
, where each row corresponds to a different country. Next, create a node for the country and add latitude and longitude as node attributes.
In the event that the geometry is not valid, rectify it using the buffer(0)
method. Basically, this method corrects invalid geometries by applying a small buffer operation with a distance of zero. This action resolves issues such as self-intersections or other geometric irregularities in the multipolygon representation.
After creating the nodes, the next step is to populate the network with the relevant edges. To do this, we iterate through the different countries, and if there is an intersection between the polygons representing both countries, it implies that they share a common border and, as a result, an edge is created between their nodes.
The next step is to visualize the created network, where the nodes represent countries around the world and the edges mean the presence of borders between them.
The function plot_country_network_on_map
is responsible for processing the nodes and edges of the graph G
and show them on a map.
The positions of the nodes in the graph are determined by the latitude and longitude coordinates of the countries.. Additionally, a map has been placed in the background to provide clearer context for the network created. This map was generated using the boundary
attribute of the GeoDataFrame
. This attribute provides information about the geometric boundaries of the countries represented, helping in the creation of the background map.
It is important to note one detail: in the GeoJSON file used there are islands that are considered independent countries, although administratively they belong to a specific country. That’s why you may see numerous dots in maritime areas. Please note that the graph created is based on the information available in the GeoJSON file from which it was generated. If we used a different file, the resulting graph would be different.
The national border network we have created can quickly help us address multiple questions.. Next, we will describe three pieces of knowledge that can be easily obtained by processing the information provided by the network. However, there are many other questions that this network can help us answer.
Insight 1: Examining the borders of a chosen nation
In this section, we will visually evaluate the neighbors of a specific country.
He plot_country_borders
The feature allows for a quick display of the borders of a specific country. This function generates a subgraph of the country provided as input and its neighboring countries. It then proceeds to visualize these countries, making it easier to observe the neighboring countries of a specific nation. In this case the chosen country is Mexico, but we can easily adapt the input to display any other country.
As you can see in the generated image, Mexico shares its border with three countries: the United States, Belize and Guatemala.
Insight 2: The 10 countries with the most borders
In this section, We will analyze which countries have the greatest number of neighboring countries. and display the results on the screen. To achieve this we have implemented the calculate_top_border_countries
function. This function evaluates the number of neighbors for each node in the network and displays only those with the largest number of neighbors (top 10).
We must reiterate that the results obtained depend on the initial GeoJSON file. In this case, the Siachen Glacier is coded as a separate country, so it appears as if it shares a border with China.
Insight 3: Exploring the shortest routes from one country to another
We conclude our analysis with an evaluation of the route. In this case, We will evaluate the minimum number of borders that must be crossed when traveling from a country of origin to a country of destination..
He find_shortest_path_between_countries
The function calculates the shortest path between a source country and a destination country. However, it is important to note that this function provides only one of the shortest possible paths. This limitation arises from your use of the shortest_path
function of NetworkX
which inherently finds a shorter path due to the nature of the algorithm used.
To access all possible paths between two points, including several shorter paths, alternatives are available. In the context of the find_shortest_path_between_countries
function, options could be explored such as all_shortest_paths
either all_simple_paths
. These alternatives are capable of returning multiple shortest paths instead of just one, depending on the specific requirements of the analysis.
We used the function to find the shortest path between Spain and Poland, and the analysis revealed that the minimum number of border crossings required to travel from Spain to Poland is 3.
Python offers a large number of libraries covering various knowledge domains, which can be seamlessly integrated into any data science project. In this case, we have used libraries dedicated to both geometric data analysis and graph analysis to create a graph that represents the borders of the world. We have subsequently demonstrated use cases of this graph to answer questions quickly, allowing us to perform geographic analysis effortlessly.
Thank you for reading.
Amanda Iglesias