Image by author
At the beginner level, we only focus on writing and executing the SQL queries. We don’t care about how long it takes to run or if it can handle millions of records. But at the intermediate level, people expect their query to be optimized and take minimal time to execute.
It is imperative to write an optimized query in large applications with millions of records, such as e-commerce platforms or banking systems. Suppose you own an e-commerce company with over a million products, and a customer wants to search for a product. What if the query you wrote in the backend takes more than a minute to get that product from the database? Will you believe that customers buy products from your website?
You must understand the importance of SQL query optimization. In this tutorial, I’ll show you some tips and tricks to optimize your SQL queries and make them run faster. The main prerequisite is that you must have a basic understanding of SQL.
To check if a specific element is present in the table, use the EXIST()
keyword instead of COUNT()
will execute the query in a more optimized way.
Wearing COUNT()
, the query should count all occurrences of that particular item which can be inefficient when the database is large. On the other hand, EXIST()
it will check for only the first occurrence of that element and then stop when it finds the first occurrence. This saves a lot of time.
Also, you are only interested in whether a particular element is present or not. You are not interested in finding the number of occurrences. For that also EXIST()
is better.
SELECT
EXISTS(
SELECT
*
FROM
table
WHERE
myColumn = 'val'
);
The above query will return 1 if at least one row of the table contains an entry where a column named myColumn
has a value equal to worth. otherwise it will come back 0.
Both char
and varchar
data types are used to store strings in the table. But varchar
is much more memory efficient than char
.
The char data type can only store the defined fixed-length character string. If the length of the string is less than the fixed length, it will pad the blanks so that its length is equal to the set length. This will waste memory unnecessarily on padding. For example,CHAR(100)
it will take 100 bytes of memory even if a single character is stored.
On the other hand, the varchar data type stores the variable-length character string that has a length less than the specified maximum length. It doesn’t fill in the blanks and just takes memory equal to the actual length of the string. For example, VARCHAR(100)
it takes only 1 byte of memory when storing a single character.
CREATE TABLE myTable (
id INT PRIMARY KEY,
charCol CHAR(10),
varcharCol VARCHAR(10)
);
In the example above, a table myTable
is created having two columns, charCol
and varcharCol
having char and varchar data types respectively. charCol
it will always occupy 10 bytes of memory. Unlike, varcharCol
occupies memory equal to the actual size of the character string stored in it.
We must avoid using subqueries inside the WHERE clause to optimize an SQL query. As subqueries can be expensive and difficult to execute when they return a large number of rows.
Instead of using the subquery, you can get the same result by using a join operation or by writing a correlated subquery. A correlated subquery is a subquery in which the inner query depends on the outer query. And they are very efficient compared to uncorrelated subquery.
Below is an example to understand the difference between the two.
# Using a subquery
SELECT
*
FROM
orders
WHERE
customer_id IN (
SELECT
id
FROM
customers
WHERE
country = 'INDIA'
);
# Using a join operation
SELECT
orders.*
FROM
orders
JOIN customers ON orders.customer_id = customers.id
WHERE
customers.country = 'INDIA';
In the first example, the subquery first collects all customer IDs belonging to INDIA, and then the outer query will get all orders for the selected customer IDs. And in the second example, we have achieved the same result by joining the customers
and orders
tables and then selecting only orders where the customers belong to INDIA.
In this way, we can optimize the query by avoiding the use of subqueries inside the WHERE clause and making them easier to read and understand.
applying the JOIN
Operating from a larger table to a smaller table is a common SQL optimization technique. Because joining from a larger table to a smaller table will make your query run faster. If we apply a JOIN
operation from a smaller table to a larger table, our SQL engine has to look for matching rows in a larger table. This requires more resources and consumes more time. But on the other hand, if the JOIN
is applied from a larger table to a smaller table, then the SQL engine has to search a smaller table for matching rows.
Here is an example for your better understanding.
# Order table is larger than the Customer table
# Join from a larger table to a smaller table
SELECT
*
FROM
Order
JOIN Customer ON Customer.id = Order.id
# Join from a smaller table to a larger table
SELECT
*
FROM
Customer
JOIN Order ON Customer.id = Order.id
Unlike LIKE
clause, regexp_like
it is also used for pattern matching. He LIKE
The clause is a basic pattern matching operator that can only perform basic operations like _ either %, which are used to match a single character or any number of characters, respectively. He LIKE
The clause must scan the entire database to find the particular pattern, which is slow for large tables.
On the other hand, regexp_like
it is a more efficient, optimized and powerful pattern finding technique. It uses more complex regular expressions to find specific patterns in a string. These regular expressions are more specific than just wildcard matching because they allow you to match the exact pattern we’re finding. Because of this, the amount of data that must be searched is reduced and the query runs faster.
Note that regexp_like
may not be present in all database management systems. Its syntax and functionality may vary on other systems.
Here is an example for your better understanding.
# Query using the LIKE clause
SELECT
*
FROM
mytable
WHERE
(
name LIKE 'A%'
OR name LIKE 'B%'
);
# Query using regexp_like clause
SELECT
*
FROM
mytable
WHERE
regexp_like(name, '^[AB].*');
The above queries are used to find the elements whose name starts with A or B. In the first example, LIKE
is used to find all names beginning with A or B. A%
means the first character is A; after that, any number of characters can be present. In the second example, regexp_like
is used inside ^[AB]
, ^
represents that the symbol will match at the beginning of the string, [AB]
represents that the initial character can be either A or B, and .*
represents all characters after that.
Wearing regexp_like
the database can quickly filter out rows that do not match the pattern, which improves performance and reduces resource usage.
In this article, we have discussed various methods and tips to optimize SQL query. This article gives you a clear understanding of how to write efficient SQL queries and the importance of optimizing them. There are many more ways to optimize queries, such as preferring to use integer values instead of characters, or using Union All instead of Union when your table does not contain duplicates, etc.
Aryan Garg is a B.Tech. Electrical Engineering student, currently in the last year of the degree. His interest lies in the field of Web Development and Machine Learning. He has pursued this interest and I am looking forward to further work in these directions.