Finding Patterns in Convenience Store Locations with Geospatial Association Rule Mining | by Elliott Humphrey | Apr, 2023

Understanding spatial trends in Tokyo convenience store location

When you walk around Tokyo, you will often pass numerous convenience stores, known locally as “konbinis”, which makes sense since there are on 56,000 convenience stores in Japan. Often there will be different chains of convenience stores located in close proximity to one another; it is not uncommon to see stores around the corner or on opposite sides of the street. However, given the population density of Tokyo, it is understandable that competing companies are forced to be closer to each other. Could there be a relationship between which convenience store chains are close to each other?

The goal will be to collect location data of numerous convenience store chains in a Tokyo neighborhood, in order to understand if there is any relationship between the chains that are located with each other. To do this will require:

Ability to query the location of different convenience stores in Tokyo, to retrieve the name and location of each store
Find which convenience stores are located within a predefined radius of each other
Using data about co-located stores to derive association rules
Plotting and visualization of results for inspection.

Let’s start!

For our use case, we want to find convenience stores in Tokyo, so first we’ll need to do some homework on what are the common chain stores. A quick google search tells me that the major stores are FamilyMart, Lawson, 7-Eleven, Ministop, Daily Yamazaki and NewDays.

Now that we know what we’re looking for, let’s OSNX; a great python package for looking up data in OpenStreetMap (OSM). According to the OSM schema, we should be able to find the store name in the ‘to burn’ either ‘brand’ field.

We can start by importing some useful libraries to get our data and defining a function to return a table of locations for a given convenience store chain within a specified area:

import geopandas as gpd
from shapely.geometry import Point, Polygon
import osmnx
import shapely
import pandas as pd
import numpy as np
import networkx as nxdef point_finder(place, tags):
'''
Returns a dataframe of coordinates of an entity from OSM.
Parameters:
place (str): a location (i.e., 'Tokyo, Japan')
tags (dict): key value of entity attribute in OSM (i.e., 'Name') and value (i.e., amenity name)
Returns:
results (DataFrame): table of latitude and longitude with entity value 
'''
gdf = osmnx.geocode_to_gdf(place)
#Getting the bounding box of the gdf
bounding = gdf.bounds
north, south, east, west = bounding.iloc[0,3], bounding.iloc[0,1], bounding.iloc[0,2], bounding.iloc[0,0]
location = gdf.geometry.unary_union
#Finding the points within the area polygon
point = osmnx.geometries_from_bbox(north,
south,
east,
west,
tags=tags)
point.set_crs(crs=4326)
point = point[point.geometry.within(location)]
#Making sure we are dealing with points
point['geometry'] = point['geometry'].apply(lambda x : x.centroid if type(x) == Polygon else x)
point = point[point.geom_type != 'MultiPolygon']
point = point[point.geom_type != 'Polygon']
results = pd.DataFrame({'name' : list(point['name']),
'longitude' : list(point['geometry'].x),
'latitude' : list(point['geometry'].y)}
)
results['name'] = list(tags.values())[0]
return results
convenience_stores = place_finder(place = 'Shinjuku, Tokyo',
tags={"brand:en" : " "})

We can pass the name of each convenience store and combine the results into a single table of store name, longitude, and latitude. For our use case we can focus on the shinjuku neighborhood in Tokyo, and see what the abundance of each convenience store looks like:

Convenience store frequency count. Image of the author.

Clearly FamilyMart and 7-Eleven dominate store frequency, but what does this look like spatially? Plotting geospatial data is fairly straightforward when using kepler.glWhich includes a nice interface for creating visualizations that can be saved as html objects or displayed directly in Jupyter notebooks:

Shinjuku convenience store location map, color-coded by store name. Image of the author.

Shinjuku convenience store location map, color-coded density within a two-minute walk radius (168 m). image by author.

Now that we have our data, the next step will be to find the nearest neighbors for each convenience store. To do this, we will use Scikit Learn’s ‘ball tree’ class to find the names of the nearest convenience stores within a two-minute walk radius. We are not interested in how many stores are considered nearest neighbors, so we will only see which convenience store chains are within the defined radius.

# Convert location to radians
locations = convenience_stores[["latitude", "longitude"]].values
locations_radians =  np.radians(locations)# Create a balltree to search locations
tree = BallTree(locations_radians, leaf_size=15, metric='haversine')
# Find nearest neighbours in a 2 minute walking radius
is_within, distances = tree.query_radius(locations_radians, r=168/6371000, count_only=False, return_distance=True)
# Replace the neighbour indices with store names
df = pd.DataFrame(is_within)
df.columns = ['indices']
df['indices'] = [[val for val in row if val != idx] for idx, row in enumerate(df['indices'])]
# create temporary index column
convenience_stores = convenience_stores.reset_index()
# set temporary index column as index
convenience_stores = convenience_stores.set_index('index')
# create index-name mapping
index_name_mapping = convenience_stores['name'].to_dict()
# replace index values with names and remove duplicates
df['indices'] = df['indices'].apply(lambda lst: list(set(map(index_name_mapping.get, set(lst)))))
# Append back to original df
convenience_stores['neighbours'] = df['indices']
# Identify when a store has no neighbours
convenience_stores['neighbours'] = [lst if lst else ['no-neighbours'] for lst in convenience_stores['neighbours']]
# Unique store names
unique_elements = set([item for sublist in convenience_stores['neighbours'] for item in sublist])
# Count each stores frequency in the set of neighbours per location
counts = [dict(Counter(row)) for row in convenience_stores['neighbours']]
# Create a new dataframe with the counts
output_df = pd.DataFrame(counts).fillna(0)[sorted(unique_elements)]

If we want to improve the accuracy of our work, we could replace the haversine distance measure with something more precise (i.e., walk times calculated using redx), but we’ll keep things simple.

This will give us a data frame where each row corresponds to a location and a binary count of the convenience store chains that are nearby:

Example convenience store nearest neighbors data frame for each location. Image of the author.

We now have a data set ready to perform association rule mining. Using the mlxtender library we can derive association rules using the a priori algorithm. there is a minimum support 5%, so that we can examine only the rules related to frequent events in our dataset (ie, convenience store chains located in the same location). We use the ‘lift’ metric when deriving rules; raise is the ratio of the proportion of locations containing both the antecedent and the consequent to the expected support under the assumption of independence.

from mlxtend.frequent_patterns import association_rules, apriori# Calculate apriori
frequent_set = apriori(output_df, min_support = 0.05, use_colnames = True)
# Create rules
rules = association_rules(frequent_set, metric = 'lift')
# Sort rules by the support value
rules.sort_values(['support'], ascending=False)

This gives us the following table of results:

Association rules for convenience store data. Image by author.

We will now interpret these association rules to do some high-level learning. To interpret this table, it is best to read more about the Association Rules, using these links:

Well, let’s go back to the table.

The bracket tells us how often the different convenience store chains are together. Therefore, we can say that 7-Eleven and FamilyMart are together in ~31% of the data. A rise above 1 indicates that the presence of the antecedent increases the probability of the consequent, suggesting that the locations of the two strings are partially dependent. On the other hand, the association between 7-Eleven and Lawson shows higher elevation but lower confidence.

Daily Yamazaki has low support near our cutoff and shows a weak relationship to FamilyMart’s location, given by a rise of slightly more than 1.

Other rules refer to combinations of convenience stores. For example, when a 7-Eleven and FamilyMart are already co-located, there is a high elevation value of 1.42 that suggests a strong association with Lawson.

If we had stopped at finding the nearest neighbors for each store location, we would not have been able to determine anything about the relationships between these stores.

An example of why geospatial association rules can be useful for businesses is determining the locations of new stores. If a convenience store chain is opening a new location, association rules can help identify which stores are likely to co-exist.

The value of this becomes clear when tailoring marketing campaigns and pricing strategies, as it provides quantitative relationships on which stores are likely to compete. Since we know that FamilyMart and 7-Eleven often co-exist, which we demonstrate with the association rules, it would make sense for both chains to pay more attention to how their products compete relative to other chains like Lawson’s and Daily Yamazaki.

In this article, we have created geospatial association rules for convenience store chains in a Tokyo neighborhood. This was done using OpenStreetMap data mining, finding nearest neighboring convenience store chains, visualizing data on maps, and creating association rules using an Apriori algorithm.

Thank you for reading!