Picture this: You have a bunch of line charts, and you’re sure there’s at least one trend hiding somewhere among all that data. Whether you’re tracking sales of your company’s thousands of products or analyzing stock market data, your goal is to uncover those subtrends and make them stand out in your visualization. Let’s explore a couple of techniques that will help you do just that.
Density line plots are a clever plotting technique introduced by Dominik Moritz and Danyel Fisher in their article, Visualizing a million time series with the density line chart. This method transforms numerous line graphs into heat maps, revealing areas where the lines overlap the most.
When we apply density line plots to the synthetic data shown above, the results look like this:
This implementation allows us to see where our trends appear and identify the subtrends that make this data interesting.
For this example we use the Python library PyDLC by Charles L. BerubéImplementation is quite straightforward, thanks to the library's user-friendly design.
plt.figure(figsize=(14, 14))
im = dense_lines(synth_df.to_numpy().T,
x=synth_df.index.astype('int64'),
cmap='viridis',
ny=100,
y_pad=0.01
)plt.ylim(-25, 25)
plt.axhline(y=0, color='white', linestyle=':')
plt.show()
When using density line graphs, keep in mind that parameters such as ny
and y_pad
Some adjustments may be necessary to obtain the best results.
This technique hasn't been discussed as much and doesn't have a universally recognized name. However, it is essentially a variation on “line density charts” or “line density visualizations,” where we use thicker lines with low opacity to reveal areas of overlap and density.
We can clearly identify what appear to be two distinct trends and observe the high degree of overlap during the downward movements of the sine waves. However, it is a little more complicated to determine where the effect is strongest.
The code for this approach is also pretty straightforward:
plt.figure(figsize=(14, 14))for column in synth_df.columns:
plt.plot(synth_df.index,
synth_df(column),
alpha=0.1,
linewidth=2,
label=ticker,
color='black'
)
Here, the two parameters that might require some adjustment are alpha
and linewidth
.
Let's imagine we are looking for subtrends in the daily returns of 50 stocks. The first step is to extract the data and calculate the daily returns.
import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as snsstock_tickers = (
'AAPL', 'MSFT', 'GOOGL', 'AMZN', 'TSLA', 'META', 'NVDA', 'BRK-B', 'UNH', 'V',
'HD', 'MA', 'KO', 'DIS', 'PFE', 'NKE', 'ADBE', 'CMCSA', 'NFLX', 'CSCO',
'INTC', 'AMGN', 'COST', 'PEP', 'TMO', 'AVGO', 'QCOM', 'TXN', 'ABT', 'ORCL',
'MCD', 'MDT', 'CRM', 'UPS', 'WMT', 'BMY', 'GILD', 'BA', 'SBUX', 'IBM',
'MRK', 'WBA', 'CAT', 'CVX', 'T', 'MS', 'LMT', 'GS', 'WFC', 'HON'
)
start_date = '2024-03-01'
end_date = '2024-09-01'
percent_returns_df = pd.DataFrame()
for ticker in stock_tickers:
stock_data = yf.download(ticker, start=start_date, end=end_date)
stock_data = stock_data.fillna(method='ffill').fillna(method='bfill')
if len(stock_data) >= 2:
stock_data('Percent Daily Return') = stock_data('Close').pct_change() * 100
stock_data('Ticker') = ticker
percent_returns_df = pd.concat((percent_returns_df, stock_data(('Ticker', 'Percent Daily Return'))), axis=0)
percent_returns_df.reset_index(inplace=True)
display(percent_returns_df)
Then we can graph the data.
pivot_df = percent_returns_df.pivot(index='Date', columns='Ticker', values='Percent Daily Return')pivot_df = pivot_df.fillna(method='ffill').fillna(method='bfill')
plt.figure(figsize=(14, 14))
sns.lineplot(data=pivot_df, dashes=False)
plt.title('Percent Daily Returns of Top 50 stocks')
plt.xlabel('Date')
plt.ylabel('Percent Daily Return')
plt.legend(title='Stock Ticker', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(True)
plt.tight_layout()
The density line chart faces some challenges with this data due to its sporadic nature. However, it still provides valuable insights into overall market trends. For example, it can spot periods where the densest areas correspond to significant declines, highlighting tough days in the market.
plt.figure(figsize=(14, 14))
im = dense_lines(pivot_df(stock_tickers).to_numpy().T,
x=pivot_df.index.astype('int64'),
cmap='viridis',
ny=200,
y_pad=0.1
)plt.axhline(y=0, color='white', linestyle=':')
plt.ylim(-10, 10)
plt.show()
However, we have found that the transparency technique works much better for this particular problem. The market declines we mentioned earlier become much clearer and more noticeable.
plt.figure(figsize=(14, 14))for ticker in pivot_df.columns:
plt.plot(pivot_df.index,
pivot_df(ticker),
alpha=0.1,
linewidth=4,
label=ticker,
color='black'
)
Both strategies have their own merits and strengths, and the best approach for your work may not be obvious until you've tried both. I hope you find one of these techniques useful for your future projects. If you know of other techniques or use cases for handling massive line charts, I'd love to hear about them!
Thanks for reading and take care.