Image by author
In the world of data, SQL remains the lingua franca for interacting with databases.
Even today it remains one of the most used languages for managing data and is still considered essential for any good data professional.
However, anyone who has worked with complex SQL queries knows that they can quickly become unwieldy beasts that are difficult to read, maintain, or reuse.
That is why today it is not enough to know SQL, we must be good at creating queries. And this is actually a type of art.
This is where common table expressions (CTEs) come into play, transforming the art of writing queries into a more structured and accessible art.
So, let's figure out together how to code readable and reusable queries.
If you are wondering what a CTE is, you are in the right article.
A common table expression (CTE) is a temporary result set that is defined within the execution scope of a single SQL statement.
They are temporary tables that can be referenced many times within a single query and are typically used to simplify complex joins and subqueries, with the ultimate goal of increasing the readability and organization of SQL code.
Therefore, they are a powerful tool for breaking down complex queries into simpler parts.
Here's why you should consider using CTE:
- Modularity: You can break complex logic into readable parts.
- Readability: Makes it easier to understand the flow of SQL queries.
- Reusability: CTEs can be referenced multiple times in a single query, avoiding repetition.
The magic starts with the WITH clause, which precedes the main query and defines different aliased temporary tables (CTEs).
Therefore, we should always start our query with the “WITH” command to begin defining our own CTEs. Using CTE, we can split any complex SQL query into:
– Small temporary tables that calculate related variables.
– A final table that only takes those variables we want as output.
And this is precisely the MODULAR approach we want in any code!
Image by author
So using CTE in our queries allows us to:
– Run a temporary table ONCE and reference it MULTIPLE times.
– Improve readability and simplify complex logic.
– Promote code reuse and modular design.
To understand this better, we can give a practical example of Airbnb ads in Barcelona.
Let's imagine that we want to analyze the performance of listings by neighborhood and compare it to the overall performance of the city. You would need to gather information about neighborhoods, individual apartments, hosts, and prices.
To exemplify this we will use the InsideAirbnb Barcelona table, which looks like this:
A naive approach could lead you to create nested subqueries that quickly become a maintenance nightmare like the following:
Code by author
Instead, we can use CTE to compartmentalize our query into logical sections, each defining a piece of the puzzle.
- Neighborhood data: Create a CTE to summarize data by neighborhood.
- Apartment and host information: Set CTE to get details about apartments and hosts.
- City-wide metrics: Another CTE to collect city level statistics for comparison.
- Final assembly: Combine the CTEs in a final SELECT statement to present the data in a consistent manner.
Image by author
And we would end up with the following query:
Code by author
Using CTE, we turn a potentially monstrous single query into an organized set of data modules. This modular approach makes SQL code more intuitive and adaptable to changes.
If new requirements arise, you can adjust or add CTE without needing to review the entire query.
Once you have established your CTEs, you can reuse them for benchmarking. For example, if you want to compare neighborhood data with citywide metrics, you can reference your CTEs in a series of JOIN operations.
This not only saves time but also keeps your code efficient since you won't have to repeat the same query twice.
CTEs are a testament to the principle that a little structure goes a long way in programming. By adopting CTE, you can write clearer, more maintainable, and reusable SQL queries.
Streamlines the query development process and makes it easier to communicate complex data retrieval logic to others.
Remember, the next time you're about to embark on writing a multi-join, nested subquery monster, consider breaking it down with CTE.
Your future self (and anyone else who can read your code) will thank you.
Joseph Ferrer He is an analytical engineer from Barcelona. He graduated in physical engineering and currently works in the field of Data Science applied to human mobility. He is a part-time content creator focused on data science and technology. You can contact him at LinkedIn, Twitter either Half.