In this blog, you'll learn how to visualize live data streams in real time, all from the convenience of your favorite tool, Jupyter Notebook.
In most projects, dynamic charts within Jupyter Notebooks require manual updates; For example, you may need to press reload to get new data and refresh the graphs. This doesn't work well for any fast-paced industry, including finance. Consider missing crucial purchasing prompts or fraud alerts because your user didn't press reload at that time.
Here, we'll show you how to move from manual updates to a streaming or real-time method in Jupyter Notebook, making your projects more efficient and responsive.
What is covered:
- Real-time display: You'll learn how to bring data to life and watch it evolve second by second, right before your eyes.
- Jupyter Notebook Mastery: Leverage the full power of Jupyter Notebook, not only for static data analysis but also for dynamic data streaming.
- Python in Quantitative Finance Use Case: Dive into a practical financial application, implementing a tool widely used in finance with real-world data.
- Streaming data processing: Understand the fundamentals and benefits of real-time data processing, a skill that is becoming increasingly crucial in today's fast-paced data world.
By the end of this blog, you will know how to create real-time visualizations similar to the one below in your Jupyter Notebook.
At the heart of our project is the concept of stream processing.
Simply put, stream processing is about handling and analyzing data in real time as it is generated. Think of it like Google Maps during a rush hour commute, where you see live traffic updates, allowing for immediate and efficient route changes.
Curiously, according to the CIO of Goldman Sachs in this Forbes podcastMoving towards real-time or real-time data processing is one of the important trends we are heading towards.
It's about combining the power of real-time data processing with a familiar, interactive Jupyter Notebooks environment.
On top of that, Jupyter Notebooks work well with containerized environments. Therefore, our projects are not just stuck on local machines; we can take them anywhere, running them seamlessly anywhere, from a colleague's laptop to a cloud server.
In finance, every second counts, whether for fraud detection or trading, and that is why data processing has become essential. The spotlight here is on. Bollinger Bands, a useful tool for financial trading. This tool includes:
- The middle band: This is a 20-period moving average, which calculates the average stock price over the last 20 periods (e.g. 20 minutes for high-frequency analysis), providing a snapshot of recent price trends.
- Outer bands: Located 2 standard deviations above and below the middle band, they indicate market volatility: wider bands suggest more volatility and narrower bands less.
In Bollinger Bands, potentially overbought conditions are signaled when the moving average price touches or exceeds the upper band (a signal to sell, often marked in red), and oversold conditions are signaled when the price falls below of the lower band (a signal to buy). , usually marked in green).
Algorithm traders often combine Bollinger Bands with other technical indicators.
Here, we made an essential adjustment when generating our Bollinger Bands by integrating trading volumes. Traditionally, Bollinger Bands do not consider trading volume and are calculated solely based on price data.
Thus, we have indicated Bollinger Bands at a distance of VWAP ± 2 × VWSTD where:
- VWAP: A 1-minute volume-weighted average price for a more volume-sensitive perspective.
- VWSTD: Represents an approach, 20 minute standard deviationthat is, a measure of market volatility.
Technical implementation:
- We use temporary sliding windows ('pw.temporal.sliding') to analyze data in 20-minute segments, similar to moving a magnifying glass over data in real time.
- We employ reducers ('pw.reducers'), which process data within each window to produce a particular result for each window, i.e. the standard deviations in this case.
- Polygon.io: Provider of real-time and historical market data. While you can certainly use their API for live data, we've pre-saved some data to a CSV file for this demo, making it easy to track without needing an API key.
- Route: An open source Pythonic framework for fast data processing. Handles batch (static) and streaming (real-time) data.
- Bokeh: Ideal for creating dynamic visualizations, Bokeh brings our streaming data to life with beautiful, interactive graphics.
- Panel: Enhances our project with real-time dashboard capabilities, working alongside Bokeh to update our visualizations as new data streams come in.
This involves six steps:
- Perform pip installation for relevant frameworks and import relevant libraries.
- Getting sample data
- Set the data source for the calculation
- Calculate essential statistics for Bollinger Bands
- Creating panels using Bokeh and Panel
- Pressing the run command
1. Imports and configuration
First, let's quickly install the necessary components.
%%capture --no-display
!pip install pathway
Start by importing the necessary libraries. These libraries will help in data processing, visualization and creating interactive dashboards.
# Importing libraries for data processing, visualization, and dashboard creation
import datetime
import bokeh.models
import bokeh.plotting
import panel
import pathway as pw
2. Obtaining sample data
Next, download the sample data from GitHub. This step is crucial to access our data for visualization. Here, we have obtained the stock prices of Apple Inc (AAPL).
# Command to download the sample APPLE INC stock prices extracted via Polygon API and stored in a CSV for ease of review of this notebook.
%%capture --no-display
!wget -nc https://gist.githubusercontent.com/janchorowski/e351af72ecd8d206a34763a428826ab7/raw/ticker.csv
Note: This tutorial takes advantage of a published showcase here
3. Data source configuration
Create a streaming data source using the CSV file. This simulates a live data stream, offering a convenient way to work with real-time data without needing an API key while building the project for the first time.
# Creating a streaming data source from a CSV file
fname = "ticker.csv"
schema = pw.schema_from_csv(fname)
data = pw.demo.replay_csv(fname, schema=schema, input_rate=1000)
# Uncommenting the line below will override the data table defined above and switch the data source to static mode, which is helpful for initial testing
# data = pw.io.csv.read(fname, schema=schema, mode="static")
# Parsing the timestamps in the data
data = data.with_columns(t=data.t.dt.utc_from_timestamp(unit="ms"))
Note: No data processing occurs immediately, but at the end, when we press the run command.
4. Calculate essential statistics for Bollinger Bands
Here, we will briefly build the trading algorithm we discussed above. We have a fictitious flow of Apple Inc. stock prices. Now, to make Bollinger Bands,
- We will calculate the 20-minute weighted standard deviation (VWSTD)
- The 1-minute weighted moving average of prices (VWAP)
- Join the two above.
# Calculating the 20-minute rolling statistics for Bollinger Bands
minute_20_stats = (
data.windowby(
pw.this.t,
window=pw.temporal.sliding(
hop=datetime.timedelta(minutes=1),
duration=datetime.timedelta(minutes=20),
),
behavior=pw.temporal.exactly_once_behavior(),
instance=pw.this.ticker,
)
.reduce(
ticker=pw.this._pw_instance,
t=pw.this._pw_window_end,
volume=pw.reducers.sum(pw.this.volume),
transact_total=pw.reducers.sum(pw.this.volume * pw.this.vwap),
transact_total2=pw.reducers.sum(pw.this.volume * pw.this.vwap**2),
)
.with_columns(vwap=pw.this.transact_total / pw.this.volume)
.with_columns(
vwstd=(pw.this.transact_total2 / pw.this.volume - pw.this.vwap**2)
** 0.5
)
.with_columns(
bollinger_upper=pw.this.vwap + 2 * pw.this.vwstd,
bollinger_lower=pw.this.vwap - 2 * pw.this.vwstd,
)
)
# Computing the 1-minute rolling statistics
minute_1_stats = (
data.windowby(
pw.this.t,
window=pw.temporal.tumbling(datetime.timedelta(minutes=1)),
behavior=pw.temporal.exactly_once_behavior(),
instance=pw.this.ticker,
)
.reduce(
ticker=pw.this._pw_instance,
t=pw.this._pw_window_end,
volume=pw.reducers.sum(pw.this.volume),
transact_total=pw.reducers.sum(pw.this.volume * pw.this.vwap),
)
.with_columns(vwap=pw.this.transact_total / pw.this.volume)
)
# Joining the 1-minute and 20-minute statistics for comprehensive analysis
joint_stats = (
minute_1_stats.join(
minute_20_stats,
pw.left.t == pw.right.t,
pw.left.ticker == pw.right.ticker,
)
.select(
*pw.left,
bollinger_lower=pw.right.bollinger_lower,
bollinger_upper=pw.right.bollinger_upper
)
.with_columns(
is_alert=(pw.this.volume > 10000)
& (
(pw.this.vwap > pw.this.bollinger_upper)
| (pw.this.vwap < pw.this.bollinger_lower)
)
)
.with_columns(
action=pw.if_else(
pw.this.is_alert,
pw.if_else(
pw.this.vwap > pw.this.bollinger_upper, "sell", "buy"
),
"hold",
)
)
)
alerts = joint_stats.filter(pw.this.is_alert)
You can consult the notebook. here for a deeper understanding of the calculations.
5. Creation of dashboards
It's time to spice up our analysis with a Bokeh chart and dashboard table visualization.
# Function to create the statistics plot
def stats_plotter(src):
actions = ("buy", "sell", "hold")
color_map = bokeh.models.CategoricalColorMapper(
factors=actions, palette=("#00ff00", "#ff0000", "#00000000")
)
fig = bokeh.plotting.figure(
height=400,
width=600,
title="20 minutes Bollinger bands with last 1 minute average",
x_axis_type="datetime",
y_range=(188.5, 191),
)
fig.line("t", "vwap", source=src)
band = bokeh.models.Band(
base="t",
lower="bollinger_lower",
upper="bollinger_upper",
source=src,
fill_alpha=0.3,
fill_color="gray",
line_color="black",
)
fig.scatter(
"t",
"vwap",
color={"field": "action", "transform": color_map},
size=10,
marker="circle",
source=src,
)
fig.add_layout(band)
return fig
# Combining the plot and table in a Panel Row
viz = panel.Row(
joint_stats.plot(stats_plotter, sorting_col="t"),
alerts.select(
pw.this.ticker, pw.this.t, pw.this.vwap, pw.this.action
).show(include_id=False, sorters=({"field": "t", "dir": "desc"})),
)
viz
When you run this cell, placeholder containers are created in your notebook for the chart and table. They will be filled with live data once the calculation begins.
6. Run the calculation
All preparations are complete, and it is time to run the data processing engine.
# Command to start the Pathway data processing engine
%%capture --no-display
pw.run()
As the dashboard updates in real time, you will see Bollinger Bands triggering actions: green markers for buying and red markers for selling, often at a slightly higher price.
Note: You must run pw.run() manually after the widget is initialized and visible. You can find more details in this GitHub issue. here.
In this blog, we understand Bollinger Bands and take you through a journey to visualize real-time financial data in Jupyter Notebook. We show how to transform live data streams into actionable insights using Bollinger Bands and a combination of open source Pythonic tools.
The tutorial provides a practical example of real-time financial data analysis, leveraging open source for an end-to-end solution, from data collection to interactive dashboard. You can create similar projects by:
- Do this for a stock of your choice by getting live stock prices from APIs like Yahoo Finance, Polygon, Kraken, etc.
- Doing this for a group of your stocks, ETFs, etc. favorites.
- Take advantage of some other trading tool besides Bollinger Bands.
By integrating these instruments with real-time data within a Jupyter Notebook, you not only analyze the market but experience it as it develops.
Happy streaming!
Mudit Srivastava Works on Camino. Prior to this, he was a founding member of ai Planet and is an active community builder in the space of LLMs and real-time machine learning.