
You’ve probably noticed that creating visually impressive charts and graphs isn’t just about choosing the right colors or shapes. The real magic happens behind the scenes, in the data that feeds those images.
But how to ensure that this data is correct? Now SQL here will be our key for the data visualization scope. SQL helps you slice, dice, and prepare your data in a way that makes it shine in whatever visualization tool you’re using.
So, what awaits you in this reading? We’ll start by showing how SQL can be used to prepare data for visualization. We’ll then walk you through different types of visualizations and how to prepare data for each, and some of them will have a final product. All of this aims to give you the keys to create compelling visual stories. So grab your coffee, this one is going to be good!
Before we delve into the types of visualizations, let’s look at how SQL prepares the data it will display. SQL is like a script writer for your visual “movie”, fine-tuning the story you want to tell.
Filter
The WHERE clause filters out unwanted data. For example, if you are only interested in users between 18 and 25 years old for your analysis, you can filter them using SQL.
Imagine you are analyzing customer feedback. With SQL, you can filter only records where the feedback rating is less than 3, highlighting areas for improvement.
SELECT * FROM feedbacks WHERE rating < 3;
Sort out
The ORDER BY clause orders your data. Sorting can be crucial for time series charts where data needs to be displayed chronologically.
When plotting a line chart for a product’s monthly sales, SQL can sort the data by month.
SELECT month, sales FROM products ORDER BY month;
Join
The JOIN statement combines data from two or more tables. This allows for richer data sets and therefore more complete visualizations.
You might have user data in one table and purchase data in another. SQL can join them together to show the total spend per user.
SELECT users.id, SUM(purchases.amount) FROM users
JOIN purchases ON users.id = purchases.user_id
GROUP BY users.id;
Cluster
The GROUP BY clause classifies the data. It is often used with aggregate functions such as COUNT(), SUM(), and AVG() to perform calculations on each group.
If you want to know the average time spent on different sections of a website, SQL can group data by section and then calculate the average.
SELECT section, AVG(time_spent) FROM website_data
GROUP BY section;
Before delving into the different types of visual aids, it is important to understand why they are essential. Think of each chart or graph as a different “lens” through which to view your data. The type you choose can help you capture trends, identify outliers, or even tell a story.
Graphics
In data science, graphs are used in the first steps of understanding a data set. For example, you could use a histogram to understand the age distribution of users in a mobile app. Tools like Matplotlib or Seaborn in Python are commonly used to plot these graphs.
You can run SQL queries to get counts, averages, or any metric you’re interested in, and feed this data directly into your charting tool to create visualizations like bar charts, pie charts, or histograms.
The following SQL query helps us add ages of users by city. It is essential to prepare the data so that we can visualize how age varies from one city to another.
# SQL code to find the average age of users in each city
SELECT city, AVG(age)
FROM users
GROUP BY city;
Let’s use Matplotlib to create a bar chart. The following code snippet assumes that grouped_df contains the average age data from the previous SQL query and creates bar charts showing the average age of users by city.
import matplotlib.pyplot as plt
# Assuming grouped_df contains the average age data
plt.figure(figsize=(10, 6))
plt.bar(grouped_df('city'), grouped_df('age'), color="blue")
plt.xlabel('City')
plt.ylabel('Average Age')
plt.title('Average Age of Users by City')
plt.show()
Here is the bar chart.
Graphics
Let’s say you’re tracking the speed of a website over time. A line chart can show you trends, peaks and valleys in the data, highlighting when the website is performing best and worst.
Tools like Plotly or Bokeh can help you create these more complex visualizations. You would use SQL to prepare the time series data, possibly running queries that calculate the average load time per day, before sending it to your charting tool.
The following SQL query calculates the average website speed for each day. This query makes it easy to build a time series line graph that shows performance over time.
-- SQL code to find the daily average loading time
SELECT DATE(loading_time), AVG(speed)
FROM website_speed
GROUP BY DATE(loading_time);
Here, let’s say we choose Plotly to create a line graph that will show website speed over time. The SQL query prepared us time series data, which shows the speed of the website over time.
import plotly.express as px
fig = px.line(time_series_df, x='loading_time', y='speed', title="Website Speed Over Time")
fig
Here is the line chart.
Panel
Dashboards are essential for projects that require real-time monitoring. Imagine a dashboard that tracks real-time user engagement metrics for an online platform.
Tools like PowerBI, Google Data Studio, or Tableau can pull data from SQL databases to populate these dashboards. SQL can add and update your data, so you always have the latest information right in your dashboard.
-- SQL code to find the current number of active users and average session time
SELECT COUNT(DISTINCT user_id) as active_users, AVG(session_time)
FROM user_sessions
WHERE session_end IS NULL;
In PowerBI, you would typically import your SQL database and run similar queries to create images for a dashboard. The benefit of using a tool like PowerBI is the ability to create dashboards in real time. You can set up multiple tiles to show average age and other KPIs, all updated in real time.
Data visualization is not just about pretty charts and graphs; It’s about telling a compelling story with your data. SQL plays a critical role in scripting that story, helping you prepare, filter, and organize data behind the scenes. Like the gears of a well-oiled machine, SQL queries serve as invisible mechanisms that make your visualizations not only possible but insightful.
If you’re hungry for more hands-on experience, visit StrataScratch Platform, which offers a wealth of resources to help you grow. Of data science interview questions From practical data projects, StrataScratch is designed to improve your skills and help you land your dream job.
Nate Rosidi He is a data scientist and in product strategy. He is also an adjunct professor of analysis and is the founder of StrataScratch, a platform that helps data scientists prepare for their interviews with real questions from top companies. Connect with him on Twitter: StrataScratch either LinkedIn.