The second season of ArcaneA recent hit Netflix series based on the universe of one of the most popular online video games of all time, League of Legends, is set in a fantasy world with heavy steampunk design, closed with stunning visuals and a record-breaking budget. As a good network and data scientist with a particular interest in turning elements of pop culture into data visualization, this was all I needed after finishing the final season to map the hidden connections and turn Arcane's story into a visualization of network, using Python. Therefore, by the end of this tutorial, you will have practical skills on how to create and visualize the network behind Arcane.
However, these skills and methods are not at all specific to this story. In fact, they highlight the general approach that network science provides for mapping, designing, visualizing and interpreting networks of any complex system. These systems can range from the transportation and spread network patterns of COVID-19 to brain networks and various social networks, such as that of the Arcane series.
All images created by the author.
Since we are going to draw the connections behind all the characters here, we first need to get a list of each character. For this, the Arcanum fan wiki The site is an excellent source of information that is free to use (CC BY-SA 3.0), which we can easily access using simple web scraping techniques. That is, we will use urllib to download, and with BeautifulSoup, we will extract the names and fan wiki profile URLs of each character listed on the main character's page.
First downloading the html from the character listing site:
import urllib
import bs4 as bs
from urllib.request import urlopenurl_char = 'https://arcane.fandom.com/wiki/Category:Characters'
sauce = urlopen(url_char).read()
soup = bs.BeautifulSoup(sauce,'lxml')
Next, I extracted all potentially relevant names. One can easily determine which tags to feed the parsed html stored in the 'soup' variable by simply right-clicking on a desired element (in this case, a character profile) and selecting the inspect elements option in any browser.
From this, I learned that a character's name and URL are stored in a line that has 'title=', but does not contain ':' (which corresponds to categories). Additionally, I created a still_character flag, which helped me decide which subpages on the character listing page still belong to legitimate characters from the story.
import rechars = soup.find_all('li')
still_character = True
names_urls = {}
for char in chars:
if '" title="' in str(char) and ':' not in char.text and still_character:
char_name = char.text.strip().rstrip()
if char_name == 'Arcane':
still_character = False
char_url = 'https://arcane.fandom.com' + re.search(r'href="((^")+)"', str(char)).group(1)
if still_character:
names_urls(char_name) = char_url
The code block above will create a dictionary ('names_urls') that stores the name and URL of each character as key-value pairs. Now let's take a quick look at what we have and print the name-url dictionary and its total length:
for name, url in names_urls.items():
print(name, url)
A sample of the output from this code block, where we can text each link, pointing to each character's bio profile:
print(len(names_urls))
Which code cell returns the result of 67, which implies the total number of named characters we have to deal with. This means we're done with the first task: we have a complete list of characters, as well as easy access to their full textual profile on their fan wiki sites.
To draw connections between two characters, we found a way to quantify the relationship between each two characters. To capture this, I rely on how frequently the two characters' biographies reference each other. From a technical standpoint, to accomplish this, we'll need to compile these full biographies that we just received links to. We will get it again using simple web scraping techniques and then save the source of each site to a separate file locally as follows.
# output folder for the profile htmls
import os
folderout = 'fandom_profiles'
if not os.path.exists(folderout):
os.makedirs(folderout)# crawl and save the profile htmls
for ind, (name, url) in enumerate(names_urls.items()):
if not os.path.exists(folderout + '/' + name + '.html'):
fout = open(folderout + '/' + name + '.html', "w")
fout.write(str(urlopen(url).read()))
fout.close()
At the end of this section, our 'fandom_profiles' folder should contain the fanwiki profiles for each Arcane character, ready to be processed as we move towards building the Arcane network.
To construct the network between characters, we assume that the intensity of interactions between two characters is indicated by the number of times each character's profile mentions the other. Therefore, the nodes of this network are the characters, who are linked with connections of different strengths depending on the number of times each character's wiki site source references any other character's wiki.
Building the network
In the following block of code, we construct the edge list: the list of connections that contains both the source and destination node (character) of each connection, as well as the weight (coreference frequency) between the two characters. Additionally, to perform the profile search effectively, I create an id_name that only contains the specific identifier for each character, without the rest of the web address.
# extract the name mentions from the html sources
# and build the list of edges in a dictionary
edges = {}
names_ids = {n : u.split('/')(-1) for n, u in names_urls.items()}for fn in (fn for fn in os.listdir(folderout) if '.html' in fn):
name = fn.split('.html')(0)
with open(folderout + '/' + fn) as myfile:
text = myfile.read()
soup = bs.BeautifulSoup(text,'lxml')
text = ' '.join((str(a) for a in soup.find_all('p')(2:)))
soup = bs.BeautifulSoup(text,'lxml')
for n, i in names_ids.items():
w = text.split('Image Gallery')(0).count('/' + i)
if w>0:
edge = '\t'.join(sorted((name, n)))
if edge not in edges:
edges(edge) = w
else:
edges(edge) += w
len(edges)
As this block of code is executed, it should return around 180 edges.
Next, we use the NetworkX graph analysis library to convert the list of edges to a graph object and output the number of nodes and edges the graph has:
# create the networkx graph from the dict of edges
import networkx as nx
G = nx.Graph()
for e, w in edges.items():
if w>0:
e1, e2 = e.split('\t')
G.add_edge(e1, e2, weight=w)G.remove_edges_from(nx.selfloop_edges(G))
print('Number of nodes: ', G.number_of_nodes())
print('Number of edges: ', G.number_of_edges())
The output of this code block:
This result tells us that although we started with 67 characters, 16 of them ended up not being connected to anyone in the network, hence the smaller number of nodes in the constructed graph.
Viewing the network
Once we have the network, we can visualize it! First, let's create a simple draft network visualization using Matplotlib and the built-in NetworkX tools.
# take a very brief look at the network
import matplotlib.pyplot as plt
f, ax = plt.subplots(1,1,figsize=(15,15))
nx.draw(G, ax=ax, with_labels=True)
plt.savefig('test.png')
The output image of this cell:
While this network already offers some clues about the main structure and most common features of the program, we can design a much more detailed visualization using the open source network visualization software Gephi. For this, we first need to export the network to a .gexf graph data file, as follows.
nx.write_gexf(G, 'arcane_network.gexf')
Now, the tutorial on how to visualize this network using Gephi:
Extras
Here comes an extension part, which I refer to in the video. After exporting the node table, including network community indexes, I read that table using Pandas and assigned individual colors to each community. I got the colors (and their hex codes) from ChatGPT and asked it to align with the program's main color themes. This block of code then exports the color, which I used again in Gephi to color the final graph.
import pandas as pd
nodes = pd.read_csv('nodes.csv')pink = '#FF4081'
blue = '#00FFFF'
gold = '#FFD700'
silver = '#C0C0C0'
green = '#39FF14'
cmap = {0 : green,
1 : pink,
2 : gold,
3 : blue,
}
nodes('color') = nodes.modularity_class.map(cmap)
nodes.set_index('Id')(('color')).to_csv('arcane_colors.csv')
As we colored the network based on the communities we found (communities meaning highly interconnected subgraphs of the original network), we discovered four main groups, each corresponding to specific sets of characters within the story. It's not so surprising that the algorithm grouped the main protagonist family with Jinx, Vi, and Vander (pink). Then, we also see Zaun's group of underground figures (blue), such as Silco, while Piltover's elite (blue) and militaristic army (green) are also well grouped together.
The beauty and use of such community structures is that while such explanations put them into context very easily, it would usually be very difficult to come up with a similar map based on intuition alone. While the methodology presented here clearly shows how we can use network science to extract the hidden connections of virtual (or real) social systems, whether partners in a law firm, co-workers in an accounting firm, and the department of human resources. of a major oil company.