In 2015, the Wall Street Journal (WSJ) Published a highly effective series of heat maps that illustrate the impact of vaccines on infectious diseases in the United States. These visualizations showed the power of general policies to promote generalized change. You can see the heat maps here.
Heat maps are a versatile tool for data analysis. Their ability to facilitate comparative analysis, highlight temporary trends and allow patterns recognition makes them invaluable to communicate complex information.
On this Fast successful data science Project, we will use the Python Matplootlib Graphics Library to recreate the WSJ's Measles chart, which demonstrates how to take advantage of heat maps and carefully designed color bars to influence data narration.
The data
The disease data comes from Pittsburgh University Tycho project. This organization works with national and global health institutes and researchers to facilitate the use data to improve global health. Measles data are available under an international audience of Creative Commons Attribusion 4.0 License.
For convenience, I have downloaded the Project Tycho's data Data portal to a CSV file and stored it in this Essence. Later, we will access by programming through the code.
The measles heat map
We will use the Matplootlib Pcoloresh () function to build a nearby facsimile of the WSJ measles heat map. While other libraries, such as Marine, Express of the plotand HV plateThey include dedicated heat map functions, these are built for ease of useWith most abstracted design decisions. This makes it difficult to force their results so that they coincide with the WSJ Heat map
Besides pcolormesh()
Matpletlib's imshow()
The function (for “image show”) can also produce heat maps. He pcolormesh
The function, however, better grid lines with cell edges.
Here is an example of a heat map made with imshow()
comparing with pcolormesh()
Results later. The main difference is the lack of grid lines.
imshow()
function (by the author)In 1963, the measles vaccine was licensed and released throughout the United States with generalized absorption. In five years, the incidence of the disease was reduced considerably. By 2000, measles had been considered eradicated in the United States, and any new case that arrives from outside the country. Note how well the visualization transmits this “general panorama” while preserving the details at the state level. This is due in large part to the choice of the color bar.
The colors used in visualization are biased. More than 80% of the color bar is composed of warm colors, and blue (light) is reserved for smaller values. This facilitates the demarcation of the periods prior to vaccination. White cells denote missing datarepresented by NAN (not a number) values.
Compare the previous heat map with one built with a more balanced bar:

The darkest blue color not only dominates the plot, it is difficult for the eyes. And although it is still possible to see the effect of the vaccine, the visual impact is much more subtle than in the plot with the biased bar. Alternatively, it is easier to analyze higher values, but at the expense of the general issue.
The code
The following code was written in Jupyterlab and is presented by Cell.
Import of libraries
The first cell matters the libraries that we will need to complete the project. An online search for the names of the library will lead to the installation instructions.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap, Normalize
from matplotlib.cm import ScalarMappable
import pandas as pd
Creating the personalized color color
The following code closely reproduces the mappapped by the WSJ. I used the online Image color coach tool to identify the key colors of a screenshot of your measles heat map and adjusted them based on colors chosen for a similar tutorial Built for R.
# Normalize RGB colors:
colors = ('#e7f0fa', # lightest blue
'#c9e2f6', # light blue
'#95cbee', # blue
'#0099dc', # dark blue
'#4ab04a', # green
'#ffd73e', # yellow
'#eec73a', # yellow brown
'#e29421', # dark tan
'#f05336', # orange
'#ce472e') # red
# Create a list of positions for each color in the colormap:
positions = (0, 0.02, 0.03, 0.09, 0.1, 0.15, 0.25, 0.4, 0.5, 1)
# Create a LinearSegmentedColormap (continuous colors):
custom_cmap = LinearSegmentedColormap.from_list('custom_colormap',
list(zip(positions,
colors)))
# Display a colorbar with the custom colormap:
fig, ax = plt.subplots(figsize=(6, 1))
plt.imshow((list(range(256))),
cmap=custom_cmap,
aspect='auto',
vmin=0, vmax=255)
plt.xticks(()), plt.yticks(())
plt.show()
Here is the generic bar produced by the code:

This code makes a continuous ColorMapap using Mateplootlib's LinearSegmentedColormap()
class. This class specifies colormaps using anchor points among which the RGB values are interpolated. That is, it generates color objects based on search tables using linear segments. Create the search table using linear interpolation for each primary color, with domain 0–1 divided into any number of segments. For more details, see this brief tutorial on how to make customs customs with Matplootlib.
Load and preparation of disease data
Next, we load the CSV file in pandas and prepare it to trace. This file contains the measles incidence (as the number of cases per 100,000 people) for each state (and the Columbia district) per week From 1928 to 2003. We will have to convert the values into a type of numerical data, add the data per year and remodel the data frame to trace.
# Read the csv file into a DataFrame:
url = 'https://bit.ly/3F47ejX'
df_raw = pd.read_csv(url, na_values='-')
# Convert to numeric and aggregate by year:
df_raw.iloc(:, 2:) = (df_raw.iloc(:, 2:)
.apply(pd.to_numeric,
errors='coerce'))
df = (df_raw.groupby('YEAR', as_index=False)
.sum(min_count=1, numeric_only=True)
.drop(columns=('WEEK')))
# Reshape the data for plotting:
df_melted = df.melt(id_vars='YEAR',
var_name='State',
value_name='Incidence')
df_pivot = df_melted.pivot_table(index='State',
columns='YEAR',
values='Incidence')
# Reverse the state order for plotting:
df_pivot = df_pivot(::-1)
This is how the initial (unprocessed) data box is seen, which shows the first five rows and ten columns:

df_raw
DataFrame (by author)NaN
The values are represented by a board (-).
The final df_pivot
Dataframe is in Large formatwhere each column represents a variable, and the rows represent unique entities:

dv_pivot
DataFrame (by author)While the layout is usually done using Long format data, as in the df_raw
Dataframe, pcolormesh()
He prefers a wide format when making heat maps. This is because heat maps are inherently designed to show a structure similar to a 2D matrix, where rows and columns represent different categories. In this case, the final plot will look a lot at the data frame, with states along the and years over the x -axis. Each cell map cell will be colored depending on numerical values.
Missing values management
The data set contains many missing values. We want to distinguish these from 0 values on the heat map making a face mask To identify and store these NaN
values. Before applying this mask with Numpy, we will use MatploTlib's Normalize()
Class a normalize The data. In this way, we can directly compare the heat map colors in the states.
# Create a mask for NaN values:
nan_mask = df_pivot.isna()
# Normalize the data for a shared colormap:
norm = Normalize(df_pivot.min().min(), df_pivot.max().max())
# Apply normalization before masking:
normalized_data = norm(df_pivot)
# Create masked array from normalized data:
masked_data = np.ma.masked_array(normalized_data, mask=nan_mask)
Drawing the heat map
The following code creates the heat map. The heart consists of the single line that calls the pcolormesh()
function. Most of the rest embellish the plot to seem WSJ Heat map (with the exception of labels x, y y colorbar, that improve in our version).
# Plot the data using pcolormesh with a masked array:
multiplier = 0.22 # Changes figure aspect ratio
fig, ax = plt.subplots(figsize=(11, len(df_pivot.index) * multiplier))
states = df_pivot.index
years = df_pivot.columns
im = plt.pcolormesh(masked_data, cmap=custom_cmap,
edgecolors='w', linewidth=0.5)
ax.set_title('Measles Incidence by State (1928-2002)', fontsize=16)
# Adjust x-axis ticks and labels to be centered:
every_other_year_indices = np.arange(0, len(years), 2) + 0.5
ax.set_xticks(every_other_year_indices)
ax.set_xticklabels(years(::2), rotation='vertical', fontsize=10)
# Adjust labels on y-axis:
ax.set_yticks(np.arange(len(states)) + 0.5) # Center ticks in cells
ax.set_yticklabels(states, fontsize=9)
# Add vertical line and label for vaccine date:
vaccine_year_index = list(years).index(1963)
ax.axvline(x=vaccine_year_index, linestyle='--',
linewidth=1, color='k')
alaska_index = states.get_loc('ALASKA')
ax.text(vaccine_year_index, alaska_index, ' Vaccine',
ha='left', va='center', fontweight='bold')
# Add a colorbar:
cbar = fig.colorbar(ScalarMappable(norm=norm, cmap=custom_cmap),
ax=ax, orientation='horizontal', pad=0.1,
label='Cases per 100,000')
cbar.ax.xaxis.set_ticks_position('bottom')
plt.savefig('measles_pcolormesh_nan.png', dpi=600, bbox_inches='tight')
plt.show()
Here is the result:

pcolormesh()
function (by the author)This is a nearby approach to WSJ Heat mapWith what I consider the most legible labels and a better separation of 0 and NaN
(Missing data) values.
Uses for heat maps
Heat maps are highly effective to demonstrate how a general policy or action affects multiple geographical regions over time. Thanks to their versatility, they can be adapted for other purposes, such as follow -up:
- Air quality index levels in different cities before and after Clean air act
- Change in exam scores for schools or districts after policies such as No child left
- Unemployment rates for different regions after economic stimulus packages
- Product sales per region after local or national advertising campaigns
Among the advantages of heat maps is that they promote multiple analysis techniques. These include:
Comparative analysis: Easily compare trends in different categories (states, schools, regions, etc.).
TEMPORARY TRENDS: It shows elegantly how values change over time.
Patterns recognition: Identify patterns and anomalies in the data at a glance.
Communication: Provide a clear and concise way to communicate complex data.
Heat maps are an excellent way to present a general description of the great image while preserving the granularity of the fine scale of the data.
(Tagstotranslate) Data visualization