3 SQL Optimization Techniques That Can Instantly Boost Query Speed

Author:Murphy  |  View: 28373  |  Time: 2025-03-23 19:04:52

Several databases have emerged to solve specific software development and Data Science needs. Yet, relational databases are still the most used and preferred by many.

The query speed is the most common reason people think of a different data model. Document-based, column families and graph databases help solve for speed above all else.

But a relational approach is more than sufficient for most use cases. Further, we can tune it for speed and still enjoy all the benefits SQL offers.

I'm not reluctant to change. Yet, keeping the technology stack as lean as possible is often my primary goal in any project. In that way, SQL is something most engineers can understand – even the new ones!

Python To SQL – I Can Now Load Data 20X Faster

Thus, switching to a different data model is the last thing I would consider. Instead, I'd do these tweaks first to get the most out of a relational database using SQL.

I haven't listed every SQL optimization technique here. But these are straightforward and don't require much effort or time. Still, it's better to decide on these techniques at the planning stage than to figure out things on the go.

Create indexes where needed and drop where they are redundant.

One of the easiest ways to improve query speed in SQL is by using indexes.

In analytics and Software Development projects, you'd probably know the most frequent ways you'd query your database during the design phase. Understanding the usage patterns is a precious piece of information.

Think of a non-fiction book. It isn't easy to know on which page we discuss a particular concept. Therefore, at the end of the book, you'd often see a section called index. It'll have all the keywords in the text sorted alphabetically and the pages where they appear. This makes it easier for us to get to where we want quickly.

These 5 SQL Techniques Cover ~80% of Real-Life Projects

Similarly, in a database, an index is like a list of keywords or topics frequently used to search for data in a table. The database can quickly find the rows that match the search criteria without scanning the entire table by creating an index on a column or set of columns.

Here's an example of Postgres. But most relational databases have similar syntax.

Sql">CREATE INDEX idx_customers_email ON customers (email);

This creates an index called "idx_customers_email" on the "email" column of the "customers" table. Now, when we run a query that filters or sorts by the "email" column, the database can use the index to quickly find the relevant rows instead of scanning the entire table.

Partial indexes

You can also create partial indexes. Say, we frequently filter our customer table using the email domain instead of the whole email. So we can create an index for email.

Here's how to create a partial index:

CREATE INDEX idx_partial_email_domain ON
students (substring(email FROM '@(.*)$'));

Creating indexes like the above will help with queries like the below:

SELECT * FROM your_table WHERE substring(email from '@(.*)$') = 'gmail.com';

Multi-column indexing

Like partial indexing, sometimes we need to query the table using more than a single column. In such a situation, a multi-column index will help with the performance.

CREATE INDEX idx_email_city ON your_table (email, city);

Now queries like the one below can be faster compared to a query before indexing.

SELECT * FROM your_table WHERE email = '[email protected]' AND city = 'New York'

Sometimes, you should let go of indexes.

Just as a book's index needs to be maintained and updated as new information is added, database indexes must also be updated to reflect changes to the data in the table.

Adding indexes can improve query performance. Yet it also slows down data modification operations (like inserting, updating, or deleting rows) since the indexes must be updated. It's essential to carefully consider which columns to index based on the queries you'll be running most frequently.

Thus, although we get more excellent query performance with indexes, we must use them sparingly. You can drop an index with a single command whenever an index is not required.

DROP INDEX idx_email_city;

Use joins and subqueries wisely.

The main reason why we use relational databases is to avoid redundancies. SQL is only useful with joins. Without joins, there's no way we get the information we need.

However, Joins are also the reason why query performance deteriorates. The more the joins, the lesser the speed.

In graph databases, the query performance isn't proportional to the joins. Instead, it's proportional to the size of the output. This is precisely where graph databases score well.

SQL on Pandas – My New Favorite for 10X speed.

Ensure you use the appropriate join type for your query – inner, outer, left, or right. When merging two large tables, cross-joins are evil. Try to avoid them as much as possible.

We use joins even when we don't want columns from the second table, yet we want to check the presence of a corresponding row. For instance, if we have two tables, courses, and students and want to fetch the list of students who have taken any of a set of courses. Here's how we usually do this with joins.

SELECT DISTINCT students.student_id, students.student_name
FROM students
INNER JOIN courses ON students.course_id = courses.course_id
WHERE courses.course_id IN ('C101', 'C102', 'C103');

This is one way of doing this, and it's usually okay. Here's another version of the same query. The below one uses subqueries instead of joins.

SELECT DISTINCT student_id, student_name
FROM students
WHERE course_id IN (
  SELECT course_id
  FROM courses
  WHERE course_id IN ('C101', 'C102', 'C103')
);

I don't mean to say subqueries are better than joins. I've experienced one style performing better in some instances, whereas the other outperforming in a different case.

Thus, it's best to test the query performance with different versions for queries we frequently use in production systems and pick the best one.

Use materialized views for rarely changing data.

There may be instances where your dataset changes at a very slow rate. Yet the query is bulk and usually takes longer than what you'd want.

Here's where materialized views are a lifesaver.

Views are saved queries that act like a table. Thus if a specific part of your query is repetitive, you can store them in a database view and reuse it wherever you want.

Materialized views are views stored along with their latest run results. Thus every time you query, the results are pulled out directly from the pre-saved results instead of querying the database.

Here's a materialized view that computes the number of customers for each country and state. You can use this materialized view to store calculated values along with its query.

CREATE MATERIALIZED VIEW customer_summary AS
SELECT customer_country, customer_state, COUNT(*) AS num_customers
FROM customers
GROUP BY customer_country, customer_state;

Now, we can query the materialized view just like any other table:

SELECT *
FROM customer_summary
WHERE customer_country = 'United States';

This significantly saves time. Yet, it doesn't reflect updates to the underlying data tables. Thus, whenever there's a change in the database, we must refresh the materialized view to show the difference.

We can refresh the data periodically if most changes aren't a big deal. For instance, if you're showing a report view for your customers, you can refresh the materialized view every midnight and let your users know that changes will appear every morning.

Here's how to refresh a materialized view:

REFRESH MATERIALIZED VIEW customer_summary;

How to Build Simple ETL Pipelines With GitHub Actions

Conclusion

Optimizing SQL queries is a big topic. I don't see this has been given any importance in small teams and individually run projects.

Yet, an unoptimized design can cost you in the form of server loads and query time in production systems.

This post isn't about every SQL optimization technique. But I often use certain things that have more significant performance improvements. These don't require a bulk of effort from all my team members.


Thanks for reading, friend! If you enjoyed my article, let's keep in touch on LinkedIn, Twitter, and Medium.

Not a Medium member yet? Please use this link to become a member because, at no extra cost for you, I earn a small commission for referring you.

Tags: Data Science Programing Software Development Software Engineering Sql

Comment