Jing for Chat2DB

Posted on May 8

SQL Optimization Techniques for Better Performance

#database #sql #programming #beginners

Optimizing SQL queries is crucial for efficient database operations and maintaining data integrity. Well-optimized queries can significantly reduce resource consumption and improve application speed. This article explores various common SQL optimization techniques with practical examples.

Note: For brevity in the examples below, * might be used in SELECT statements when the focus is on other clauses. However, the first point emphasizes why this should generally be avoided.

1. Specify Column Names Instead of Using `SELECT *`

Anti-Pattern (Bad Example):

SELECT * FROM Products;

Pro-Pattern (Good Example):

SELECT product_id, product_name, unit_price FROM Products;

Reasoning:

Saves resources and reduces network overhead: Fetching only necessary columns transmits less data.
Enables covering indexes: If all selected columns are part of an index, the database can retrieve data directly from the index without accessing the table (reducing “table lookups”), which significantly improves query efficiency.

2. Avoid Using `OR` to Connect Conditions in `WHERE` Clauses

Anti-Pattern:

SELECT product_name, category 
FROM Products 
WHERE category = 'Electronics' 
OR supplier_id = 10;

Pro-Pattern:

(1) Use UNION ALL:

SELECT
  product_name,
  category
FROM
  Products
WHERE
  category = 'Electronics'
UNION ALL
SELECT
  product_name,
  category
FROM
  Products
WHERE
  supplier_id = 10
  AND category != 'Electronics';
-- (ensure distinctness if original OR implied it)
-- Or if exact duplication from OR is fine and they can overlap:
-- SELECT product_name, category FROM Products WHERE category = 'Electronics'
-- UNION ALL
-- SELECT product_name, category FROM Products WHERE supplier_id = 10;

(2) Write two separate SQL queries:

SELECT
  product_name, category
FROM
  Products
WHERE
  category = 'Electronics';
SELECT
  product_name, category
FROM
  Products
WHERE
  supplier_id = 10;

Reasoning:

Using OR can sometimes cause indexes to be ignored, leading to a full table scan.
If one part of the OR condition (e.g., supplier_id) uses an index, but the other part (e.g., category if it's unindexed or the optimizer chooses not to use its index) doesn't, the database might still perform a full scan for the second condition or engage in a more complex plan (index scan + table scan + merge).
Although modern database optimizers are quite smart, OR conditions can make it harder for them to choose the most efficient plan, potentially leading to index non-utilization.

3. Prefer Numerical Types Over String Types for Identifiers and Flags

Pro-Pattern:

Primary Key (id): Use numerical types like INT or BIGINT. E.g., order_id INT PRIMARY KEY.
Status flags (is_active): Use TINYINT (e.g., 0 for false, 1 for true) as databases often lack a native boolean type (MySQL recommends TINYINT(1)).

Reasoning:

Database engines compare strings character by character, which is slower than comparing numbers (a single operation).
String comparisons can degrade query and join performance and increase storage overhead.

4. Use `VARCHAR` Instead of `CHAR` for Variable-Length Strings

Anti-Pattern:

`customer_address` CHAR(200) DEFAULT NULL COMMENT 'Customer Address'

Pro-Pattern:

`customer_address` VARCHAR(200) DEFAULT NULL COMMENT 'Customer Address'

Reasoning:

VARCHAR stores data based on the actual content length, saving storage space. CHAR pads the string with spaces up to the declared length.
Searching within a smaller field (actual data length in VARCHAR) can be more efficient.

5. Technical Extension: `CHAR` vs. `VARCHAR2` (Common in Oracle)

Fixed vs. Variable Length:

CHAR has a fixed length, while VARCHAR2 has a variable length. For example, storing "XYZ" in a CHAR(10) column uses 10 bytes (including 7 trailing spaces). The same string in VARCHAR2(10) uses only 3 bytes (10 is the maximum).

Efficiency:

CHAR can be slightly more efficient for retrieval if the data length is consistently fixed and known, as the database knows the exact position of subsequent rows/columns.

When to Use Which?

This is often a trade-off: VARCHAR2 saves space but might have a slight performance overhead compared to CHAR for truly fixed-length data.

Frequent updates to VARCHAR2 columns with varying data lengths can lead to "row migration" (if the new data is larger and doesn't fit in the original block), causing extra I/O. In such specific scenarios, CHAR might be better.

When querying CHAR columns, remember that they are space-padded. You might need to use TRIM() if exact matches (without padding) are required, which can affect index usage. RPAD() might be used on bind variables to match CHAR field lengths, which is generally better than applying TRIM() to the column in WHERE clauses.

Due to potential wasted space and issues with comparisons/binding, many developers prefer VARCHAR or VARCHAR2 unless there's a very specific reason for CHAR.

6. Use Default Values Instead of `NULL` in `WHERE` Clauses Where Appropriate

Anti-Pattern:

SELECT * FROM Orders WHERE discount_applied IS NOT NULL;

Pro-Pattern (assuming 0 is a meaningful default for no discount):

SELECT * FROM Orders WHERE discount_amount > 0;
-- Or, if you have a status column:
-- SELECT * FROM Orders WHERE order_status != 'CANCELLED_NO_CHARGE'; (where 'CANCELLED_NO_CHARGE' might imply a NULL or zero discount)

Reasoning:

Using IS NULL or IS NOT NULL doesn't always prevent index usage, but it can be less optimal. This depends on the MySQL version, table statistics, and query cost.
If the optimizer determines that using an index for conditions like !=, <>, IS NULL, IS NOT NULL is more costly than a full table scan, it will abandon the index.
Replacing NULL with a sensible default value can often make it more likely for the optimizer to use an index and can also make the query's intent clearer.

7. Avoid Using `!=` or `<>` Operators in `WHERE` Clauses if Possible

Anti-Pattern:

SELECT
  *
FROM
  Employees
WHERE
  department_id   != 10;
SELECT
  *
FROM
  Employees
WHERE
  status <> 'Terminated';

Reasoning:

Using != or <> can often lead to the optimizer ignoring indexes and performing a full table scan.
While not universally true (sometimes indexes are still used, especially if the distinct values are few), it’s a common pitfall.
If business logic absolutely requires it, then use them, but be aware of the potential performance impact. Consider alternative ways to phrase the logic if possible (e.g., using IN for allowed values).

8. Prefer `INNER JOIN`; Optimize `LEFT JOIN` and `RIGHT JOIN`

If INNER JOIN, LEFT JOIN, and RIGHT JOIN can produce the same logical result set for your specific query, INNER JOIN is generally preferred.

When using LEFT JOIN, try to ensure the "left" table (the one from which all rows are preserved) is the smaller of the two after any WHERE clause filtering on that table.

Explanation:

INNER JOIN: Returns only matching rows from both tables. If it's an equijoin, the result set is often smaller, leading to better performance.
LEFT JOIN: Returns all rows from the left table and matched rows from the right table (or NULLs if no match).
RIGHT JOIN: Returns all rows from the right table and matched rows from the left table.
The “small table drives big table” principle: MySQL (and other databases) often try to optimize joins by iterating through the smaller result set and probing the larger one. So, reducing the size of the “driving” table (e.g., the left table in a LEFT JOIN after its own WHERE conditions) can improve performance.

9. Improve `GROUP BY` Efficiency

Anti-Pattern (Filter after grouping):

SELECT
  department,
  AVG(salary)
FROM
  EmployeeDetails
GROUP BY
  department
HAVING
  department = 'Sales'
  OR department = 'Marketing';

Pro-Pattern (Filter before grouping):

SELECT
  department,
  AVG(salary)
FROM
  EmployeeDetails
WHERE
  department = 'Sales'
  OR department = 'Marketing'
GROUP BY
  department;

Reasoning

Filtering records with WHERE before grouping reduces the number of rows that need to be processed by the GROUP BY operation.

10. Prefer `TRUNCATE` for Clearing All Rows from a Table

TRUNCATE TABLE is functionally similar to DELETE FROM table_name (without a WHERE clause) as both delete all rows. However, TRUNCATE TABLE is faster and uses fewer system and transaction log resources.

DELETE removes rows one by one and logs each deletion. TRUNCATE TABLE deallocates the data pages used by the table and only logs the page deallocations.

TRUNCATE TABLE resets any auto-increment identity counter to its seed value. If you need to preserve the identity counter, use DELETE.

You cannot use TRUNCATE TABLE on a table referenced by a FOREIGN KEY constraint (use DELETE instead) or on tables participating in an indexed view. TRUNCATE TABLE does not activate triggers.

To remove the table definition along with its data, use DROP TABLE.

11. Use `LIMIT` or Batch Processing for `DELETE` or `UPDATE` Operations

Reasons

Reduce cost of errors: If you accidentally run a DELETE or UPDATE without a WHERE clause (or with an incorrect one), LIMIT restricts the damage. Recovering a few rows from binlogs is easier than recovering an entire table.
Potentially higher SQL efficiency: For DELETE FROM ... WHERE ... LIMIT 1, if the first row scanned matches, the operation can stop. Without LIMIT, it might scan more.
Avoid long transactions: Large DELETE or UPDATE operations can lock many rows (and potentially cause gap locks if indexed columns are involved) for extended periods, impacting concurrent operations.
Prevent high CPU load: Deleting a massive number of rows at once can spike CPU usage, slowing down the deletion process itself and other system operations.
Avoid table locking: Very large DML operations can lead to lock contention or lock wait timeout errors. Batching is recommended.

12. `UNION` vs. `UNION ALL` Operator

UNION combines result sets and then sorts them to remove duplicate records. This sorting and duplicate removal can be resource-intensive, especially with large datasets (potentially using disk for sorting).

Example of a potentially inefficient `UNION`:

SELECT
  employee_name,
  department
FROM
  CurrentEmployees
UNION
SELECT
  employee_name,
  department
FROM
  ArchivedEmployees;

Recommendation:

Use UNION ALL if you know the combined result sets won't have duplicates or if duplicates are acceptable. UNION ALL simply concatenates the results without checking for duplicates, making it much faster.

13. Improving Bulk Insert Performance

Anti-Pattern (Multiple single-row inserts)

INSERT INTO
  Subscribers (email, signup_date)
VALUES
  ('test1@example.com', '2024-01-10');
INSERT INTO
  Subscribers (email, signup_date)
VALUES
  ('test2@example.com', '2024-01-11');

Pro-Pattern (Batch insert)

INSERT INTO
  Subscribers (email, signup_date)
VALUES
  ('test1@example.com', '2024-01-10'),
  ('test2@example.com', '2024-01-11');

Reasoning

Each INSERT statement typically runs in its own transaction (by default), incurring overhead for transaction start and commit. Batching multiple rows into a single INSERT statement reduces this overhead to a single transaction, significantly improving efficiency, especially for large volumes of data.

14. Limit the Number of Table Joins and Indexes

Limit Table Joins (Generally to 5 or fewer):

The more tables joined, the higher the compilation time and overhead for the query optimizer.

Each join might involve creating temporary tables in memory or on disk.

Complex joins can be harder to read and maintain. Consider breaking them into smaller, sequential operations if possible.

If you consistently need to join many tables, it might indicate a suboptimal database design. (Alibaba’s Java guidelines suggest joins of three tables or fewer).

Limit Indexes (Generally to 5 or fewer per table):

Indexes improve query speed but slow down INSERT, UPDATE, and DELETE operations because indexes also need to be updated.

Indexes consume disk space.

Index data is sorted, and maintaining this order takes time.

Rebuilding indexes (which can happen during DML) on large tables can be time-consuming.

Carefully consider if each index is truly necessary.

15. Avoid Using Built-in Functions on Indexed Columns in `WHERE` Clauses

Anti-Pattern:

SELECT * FROM Orders WHERE YEAR(order_date) = 2023;

Pro-Pattern:

SELECT
  *
FROM
  Orders
WHERE
  order_date >= '2023-01-01'
  AND order_date < '2024-01-01';

Reasoning:

Applying a function to an indexed column in the WHERE clause usually prevents the database from using the index on that column directly (this is often called making the condition "non-sargable"). The database would have to compute the function's result for every row before applying the filter.

16. Composite Indexes and Sort Order

When sorting, if you have a composite index (e.g., INDEX idx_dept_job_hire (department_id, job_title, hire_date)), your ORDER BY clause should follow the order of columns in the index for optimal performance.

-- Example of good usage for an index on (department_id, job_title, hire_date) 
SELECT
  employee_id,
  full_name
FROM
  Employees
WHERE
  department_id = 5
  AND job_title = 'Engineer'
ORDER BY
  hire_date DESC;
-- Index can be used for filtering and then sorting part of hire_date

If the ORDER BY clause doesn't align with the index prefix or order, the database might not be able to use the index efficiently for sorting, potentially leading to a filesort operation.

17. The Left-Most Prefix Rule for Composite Indexes

If you create a composite index like ALTER TABLE Customers ADD INDEX idx_lastname_firstname (last_name, first_name), this is equivalent to having usable index paths for:

(last_name)
(last_name, first_name)

Effective Use (Satisfies Left-Most Prefix):

SELECT * FROM
  Customers
WHERE
  last_name = 'Smith';
SELECT * FROM
  Customers
WHERE
  last_name = 'Smith'
  AND first_name = 'John';

Ineffective Use (Violates Left-Most Prefix, index likely not used or not fully):

SELECT * FROM Customers WHERE first_name = 'John';

Optimizer May Help:

-- MySQL optimizer is often smart enough to reorder conditions
SELECT
  *
FROM
  Customers
WHERE
  first_name = 'John'
  AND last_name = 'Smith';
-- This will likely be optimized to use the (last_name, first_name) index.

Reasoning:

The database can efficiently seek based on the leading columns of a composite index. If a query doesn’t use the first column(s) of the index in its predicates, it generally cannot use that index effectively.

18. Optimizing `LIKE` Statements

Using LIKE for pattern matching is common but can be an index killer.

Anti-Pattern (Index typically not used or full scan within index):

SQL

SELECT * FROM Articles WHERE title LIKE '%database%'; 
-- Leading wildcard 
SELECT * FROM Articles WHERE title LIKE '%database';  
-- Leading wildcard (equivalent to above for index usage)

Pro-Pattern (Index can be used for a range scan):

SELECT * FROM Articles WHERE title LIKE 'database%'; 
-- Trailing wildcard

SELECT * FROM Articles WHERE title LIKE 'database%'; -- Trailing wildcard

Reasoning:

Avoid leading wildcards (%...) if possible, as they prevent direct index seeks. A trailing wildcard (...%) can often use an index.
If a leading wildcard is unavoidable, consider alternative solutions like Full-Text Search engines (e.g., Elasticsearch, Solr, or built-in FTS capabilities of your RDBMS) for better performance. Some databases offer ways to handle reverse indexes or function-based indexes on REVERSE(column) to support LIKE '%...' queries.

19. Use `EXPLAIN` to Analyze Your SQL Execution Plan

Understanding the output of EXPLAIN (or EXPLAIN ANALYZE) is key to diagnosing query performance. Pay attention to:

`type` (Join Type):

system: Table has only one row.
const: Table has at most one matching row (e.g., primary key lookup).
eq_ref: One row is read from this table for each combination of rows from the previous tables. Excellent for joins.
ref: All rows with matching index values are read.
range: Only rows in a given range are retrieved, using an index.
index: Full scan of an index. Faster than ALL if index is smaller than table.
ALL: Full table scan.
Performance ranking (best to worst): system > const > eq_ref > ref > range > index > ALL. Aim for ref or range in practical optimizations.

`Extra` (Additional Information):

Using index: Data is retrieved solely from the index tree (covering index), no table lookup needed.
Using where: WHERE clause is used to filter rows after they are retrieved from storage (either from table or index). If type is ALL or index and Extra doesn't show Using where, the query might be fetching more data than intended before filtering.
Using temporary: MySQL creates a temporary table to hold intermediate results (common for GROUP BY or ORDER BY on different columns).
Using filesort: MySQL must do an external sort of the rows.

20. Other Optimization Tips

Add Comments: Always add comments to tables and columns in your schema design.
Consistent SQL Formatting: Use consistent capitalization for keywords and proper indentation for readability.
Backup Before Critical DML: Always back up data before performing significant modifications or deletions.
EXISTS vs. IN: In many cases, using EXISTS can be more efficient than IN, especially when the subquery returns a large number of rows. However, test both, as optimizers vary.
Implicit Type Conversion: Be mindful of data types in WHERE clauses. Comparing a string column to a number (e.g., indexed_string_column = 123) can cause implicit type conversion and prevent index usage. Use appropriate literals (e.g., indexed_string_column = '123').
Define Columns as NOT NULL where possible: NOT NULL columns can be more space-efficient (no need for a bit to mark NULL) and can simplify queries (no need to handle NULL logic as extensively).
Soft Deletes: Consider a “soft delete” pattern (e.g., an is_deleted flag or deleted_at timestamp) instead of physically deleting rows, especially if audit trails or easy undelete functionality are needed.
Unified Character Set: Use a consistent character set (e.g., UTF8MB4) for your database and tables to avoid encoding issues and potential performance degradation from character set conversions during comparisons (which can also invalidate indexes).
SELECT COUNT(*): A SELECT COUNT(*) or SELECT COUNT(1) from a table without a WHERE clause will perform a full table scan (or full index scan if a small suitable index exists). This can be very slow on large tables and often has limited business meaning without context. If you need an exact count, accept the cost; if an estimate is fine, some databases offer faster approximations.
Avoid Expressions on Columns in WHERE:

If a WHERE clause applies an expression or function to a column (e.g., WHERE salary * 1.1 > 50000), the index on salary is usually not used. Rewrite to WHERE salary > 50000 / 1.1.

11. Temporary Tables:

Avoid frequently creating and dropping temporary tables.
For large, one-time insertions into a temporary table, SELECT ... INTO temptable (syntax varies by DB) might be faster than CREATE TABLE ...; INSERT INTO ...; as it can reduce logging. For smaller amounts, CREATE then INSERT is fine.
Always explicitly DROP temporary tables when done, preferably after a TRUNCATE if you want to release space immediately and reduce contention on system tables.

12. Indexes on Low-Cardinality Columns: Avoid creating indexes on columns with very few distinct values (e.g., a gender column with ‘Male’, ‘Female’, ‘Other’). They are usually not selective enough for the optimizer to use. However, columns used frequently for sorting, even if low cardinality, might benefit from an index.

13. DISTINCT on Few Columns: Using DISTINCT requires the database to compare and filter data, which consumes CPU. The more columns in the SELECT DISTINCT list, the more complex the comparison. Use it only when necessary.

14. Avoid Large Transactions: Break down large operations into smaller transactions to improve system concurrency and reduce locking duration.

15. Use InnoDB (for MySQL): Unless you have very specific needs (like full-text search features only in MyISAM, or column-store needs), InnoDB is generally the preferred storage engine in MySQL due to its support for transactions, row-level locking, and better crash recovery.

Supercharge Your SQL Workflow with Chat2DB

Optimizing SQL is an ongoing process, and having the right tools can make a world of difference. If you’re looking to streamline your database management and query optimization, consider giving Chat2DB a try!

Chat2DB (https://chat2db.ai) is an intelligent, versatile, and AI-powered database client that supports a wide range of databases, including PostgreSQL, MySQL, SQL Server, Oracle, and more.

Here’s how Chat2DB can help you with the principles discussed in this article:

AI-Powered Query Generation & Optimization: Struggling to write complex queries or unsure how to optimize an existing one? Chat2DB’s AI assistant can help you generate efficient SQL from natural language prompts and even offer suggestions to improve your existing queries. This can help you avoid common pitfalls like using SELECT * or inefficient JOIN conditions.
Effortless Schema Exploration: Understanding your table structures, indexes, and constraints is key to writing good SQL. Chat2DB provides an intuitive interface to explore your database schema easily.
Data Conversion & Management: Simplify tasks like data import/export, and manage multiple database connections seamlessly.
Private Deployment & Security: Chat2DB supports private deployment, ensuring your data and database interactions remain secure within your environment.

By making it easier to write, analyze, and manage your SQL, Chat2DB empowers you to apply these optimization techniques more effectively, saving you time and helping you build more performant applications.

Short-term memory for faster AI agents 🤖💨

AI agents struggle with latency and context switching. Redis fixes it with a fast, in-memory layer for short-term context—plus native support for vectors and semi-structured data to keep real-time workflows on track.

Start building

1. Specify Column Names Instead of Using SELECT *

Anti-Pattern (Bad Example):

Pro-Pattern (Good Example):

Reasoning:

2. Avoid Using OR to Connect Conditions in WHERE Clauses

Anti-Pattern:

Pro-Pattern:

Reasoning:

3. Prefer Numerical Types Over String Types for Identifiers and Flags

Pro-Pattern:

Reasoning:

4. Use VARCHAR Instead of CHAR for Variable-Length Strings

Anti-Pattern:

Pro-Pattern:

Reasoning:

5. Technical Extension: CHAR vs. VARCHAR2 (Common in Oracle)

Fixed vs. Variable Length:

Efficiency:

When to Use Which?

6. Use Default Values Instead of NULL in WHERE Clauses Where Appropriate

Anti-Pattern:

Pro-Pattern (assuming 0 is a meaningful default for no discount):

Reasoning:

7. Avoid Using != or <> Operators in WHERE Clauses if Possible

Anti-Pattern:

Reasoning:

8. Prefer INNER JOIN; Optimize LEFT JOIN and RIGHT JOIN

Explanation:

9. Improve GROUP BY Efficiency

Anti-Pattern (Filter after grouping):

Pro-Pattern (Filter before grouping):

Reasoning

10. Prefer TRUNCATE for Clearing All Rows from a Table

11. Use LIMIT or Batch Processing for DELETE or UPDATE Operations

Reasons

12. UNION vs. UNION ALL Operator

Example of a potentially inefficient UNION:

Recommendation:

13. Improving Bulk Insert Performance

Anti-Pattern (Multiple single-row inserts)

Pro-Pattern (Batch insert)

Reasoning

14. Limit the Number of Table Joins and Indexes

Limit Table Joins (Generally to 5 or fewer):

Limit Indexes (Generally to 5 or fewer per table):

15. Avoid Using Built-in Functions on Indexed Columns in WHERE Clauses

Anti-Pattern:

Pro-Pattern:

Reasoning:

16. Composite Indexes and Sort Order

17. The Left-Most Prefix Rule for Composite Indexes

Effective Use (Satisfies Left-Most Prefix):

Ineffective Use (Violates Left-Most Prefix, index likely not used or not fully):

Reasoning:

18. Optimizing LIKE Statements

Anti-Pattern (Index typically not used or full scan within index):

Pro-Pattern (Index can be used for a range scan):

Reasoning:

19. Use EXPLAIN to Analyze Your SQL Execution Plan

type (Join Type):

Extra (Additional Information):

20. Other Optimization Tips

Supercharge Your SQL Workflow with Chat2DB

Short-term memory for faster AI agents 🤖💨

Let’s Go!

1. Specify Column Names Instead of Using `SELECT *`

2. Avoid Using `OR` to Connect Conditions in `WHERE` Clauses

4. Use `VARCHAR` Instead of `CHAR` for Variable-Length Strings

5. Technical Extension: `CHAR` vs. `VARCHAR2` (Common in Oracle)

6. Use Default Values Instead of `NULL` in `WHERE` Clauses Where Appropriate

7. Avoid Using `!=` or `<>` Operators in `WHERE` Clauses if Possible

8. Prefer `INNER JOIN`; Optimize `LEFT JOIN` and `RIGHT JOIN`

9. Improve `GROUP BY` Efficiency

10. Prefer `TRUNCATE` for Clearing All Rows from a Table

11. Use `LIMIT` or Batch Processing for `DELETE` or `UPDATE` Operations

12. `UNION` vs. `UNION ALL` Operator

Example of a potentially inefficient `UNION`:

15. Avoid Using Built-in Functions on Indexed Columns in `WHERE` Clauses

18. Optimizing `LIKE` Statements

19. Use `EXPLAIN` to Analyze Your SQL Execution Plan

`type` (Join Type):

`Extra` (Additional Information):

Short-term memory for faster AI agents 🤖💨