We've just witnessed three wonderful weeks of sport during the 2024 Olympic Games in Paris, with millions watching the broadcasts and cheering on their favourites. At the same time, we've seen household names take the stage again and the next generation of stat athletes emerge.
As a curious data scientist, I started to wonder how the collective opinion behind these topics evolves: what are the most popular sports, and which athletes are on the rise? Then, to back this up with data, I decided to conduct an exhaustive data collection of Wikipedia, containing both Wikipedia profiles and information on view counts, and then compare the popularity of different sports, as well as top athletes. Below you will find my results and all the Python code needed to reproduce them.
All images were created by the author.
First, let’s visit the Wikipedia site about the 2024 Summer Olympics and download it using the requests library. Then, I use the BeautifulSoup package to extract information from the HTML code, i.e. the list of all summer sports and their links to Wiki sites. To extract the sports, I also follow my manual…