In this short article, I use public data from Wikipedia, Python programming, and network analysis to extract and construct a network of Oscar-winning actors and actresses.
All images were created by the author.
Wikipedia, as the largest free, crowdsourced online encyclopedia, serves as a tremendously rich source of data on various public domains. Many of these domains, from film to politics, involve several layers of underlying networks, expressing different types of social phenomena such as collaboration. Since the Academy Awards ceremony is approaching, here I show the example of Oscar-winning actors and actresses on how we can use simple Pythonic methods to convert Wiki sites into networks.
First, let's take a look at how, for example, the Wiki list of all Oscar winning actors. It is structured:
This subpage nicely shows everyone who has ever received an Oscar and been granted a Wiki profile (chances are, no actor or actress has been overlooked by fans). In this article, I focus on the acting, which can be found on the following four subpages, including the main and supporting actors and actresses:
urls = { 'actor' :'https://en.wikipedia.org/wiki/Category:Best_Actor_Academy_Award_winners',
'actress' : 'https://en.wikipedia.org/wiki/Category:Best_Actress_Academy_Award_winners',
'supporting_actor' : 'https://en.wikipedia.org/wiki/Category:Best_Supporting_Actor_Academy_Award_winners',
'supporting_actress' : 'https://en.wikipedia.org/wiki/Category:Best_Supporting_Actress_Academy_Award_winners'}
Now let's write a simple block of code that checks each of these four listings and use the packages Screaming and beautiful soupextracts the name of all the artists:
from urllib.request import urlopen
import bs4 as bs
import re# Iterate across the four categories
people_data = ()
for category, url in urls.items():
# Query the name listing page and…