Examples of how to create different types of pie charts using Matplotlib to visualize database analysis results in a Jupyter Notebook with Pandas
While working on my master's thesis titled “Factors Associated with Impactful Scientific Publications in NIH-Funded Heart Disease Research,” I used different types of pie charts to illustrate some of the key findings from the database analysis.
A pie chart can be an effective choice for data visualization when a data set contains a limited number of categories that represent parts of a whole, making it well suited for displaying categorical data with an emphasis on comparing the relative proportions of each. category.
In this article, I will demonstrate how to create four different types of pie charts using the same data set to provide a more complete visual representation and deeper insight into the data. To achieve this, I will use Matplotlib, Python's plotting library, to display pie chart visualizations of the statistical data stored in the data frame. If you're not familiar with the Matplotlib library, a good start is the Python Data Science Handbook by Jake VanderPlas, specifically the chapter on Visualization with Matplotlib and matplotlib.org.
First, let's import all the necessary libraries and extensions:
Next, we will prepare the CSV file for processing:
The mini-dataset used in this article highlights the top 10 journals with research publications on heart disease between 2002 and 2020 and is part of a larger database collected for master's thesis research. The “Female,” “Male,” and “Unknown” columns represent the gender of the first author of the published articles, while the “Total” column reflects the total number of heart disease research articles published in each journal.
For smaller data sets with fewer categories, a pie chart with exploding slices can effectively highlight a key category by separating it slightly from the rest of the chart. This visual effect draws attention to specific categories, making them stand out from the rest. Each segment represents a part of the total, and its size is proportional to the data it represents. Labels can be added to each sector to indicate the category, along with percentages to show its proportion to the total. This visual technique makes the exploited segment stand out without losing the context of the entire data representation.
The same burst slices technique can be applied to all other entries in the sample data set and the resulting graphs can be displayed in a single figure. This type of visualization helps highlight the over- or under-representation of a particular category within the data set. In the example provided, presenting all 10 graphs in one figure reveals that none of the top 10 journals in heart disease research published more articles written by women than by men, emphasizing the gender disparity.
A variation of the pie chart, known as a donut chart, can also be used to visualize data. Donut charts, like pie charts, show the proportions of categories that make up a whole, but the center of the donut chart can also be used to present additional data. This format is less visually cluttered and can make it easier to compare the relative sizes of slices compared to a standard pie chart. In the example used in this article, the donut chart highlights that among the top 10 journals with research publications on heart disease, the American Journal of Physiology, Heart and Circulatory Physiology published the most articles, at 21.8%. .
We can improve the display of additional information from the sample data set by building on the donut chart above and creating a nested version. He add_artist() Matplotlib's shape module method is used to incorporate any additional artists (such as shapes or objects) into the base shape. As in the previous donut chart, this variation shows the distribution of publications in the top 10 heart disease research journals. However, it also includes an additional layer showing the gender distribution of the first authors of each journal. This visualization highlights that a higher percentage of first authors are men.
In conclusion, pie charts are effective for visualizing data with a limited number of categories as they allow viewers to quickly understand the most important categories or dominant proportions at a glance. In this specific example, using four different types of pie charts provides a clear visualization of the gender distribution among first authors in the top 10 journals publishing heart disease research, based on the 2002 to 2020 mini-dataset used. in this study. It is evident that a greater percentage of the journal's first authors are men, and none of the top 10 heart disease research journals published more articles written by women than by men during the period examined.
Jupyter Notebook and the dataset used for this article can be found at GitHub
Thanks for reading,
Diana
Note: I used GitHub embeds to publish this article.