The two cornerstones of database performance optimization

Written by Admin

database performance optimization

Database performance optimization is a perennial topic, and DBAs seem to be discussing it endlessly. The reason for this is that performance issues are one of the most frequent complaints from end users. Kind technical issues—there are none.

If DBAs can quickly resolve performance bottlenecks, they are heroes in the team; if they cannot locate the problem in time, even the best architecture design may be rejected by users.

So the question is: when faced with a poorly performing database, what should you prioritize?

I believe there are two key points that should be the starting point for any performance optimization work.

I. Keep statistical information up-to-date
1.1 Why is statistical information so important?

Without statistics, relational optimizers cannot make accurate execution plan decisions. Database statistics provide intelligence about the state and organization of the data, which the optimizer uses to determine the most efficient way to acquire the data.

Statistical information can be understood as census data in a database. Without it, the optimizer is like a blind person who can only randomly choose an execution path.

1.2 Core Components of Statistical Information

Statistical Information Types Contents Impact on optimizer
Table-level statistical information Total number of rows, compression ratio, total number of data blocks Estimating the cost of a full table scan
Column-level statistics Number of discrete values, histogram of data distribution Determine the selectivity of predicates
Tablespace statistics: activity page count, clustering rate Assess I/O costs
Index Statistics: Number of leaf pages, index level, number of discrete key values Determine the cost of index scan

1.3 Timing of Statistical Information Collection

Statistical information is generated through specific commands—different databases have different syntax:

    DB2:RUNSTATS

    SQLServer:UPDATE STATISTICS

    Oracle :GATHER_TABLE_STATS

    MySQL :ANALYZE TABLE

The key principle is that statistical information must be updated promptly after significant changes occur in the data.

In the following scenarios, it is recommended to collect statistical information immediately:

    After batch data import/export (data volume changes by more than 10%)

    After deleting a large amount of data

    After the table structure is changed (such as adding an index or modifying column types)

    Regular maintenance windows (e.g., weekly/monthly)

Avoid these pitfalls: Outdated statistics are one of the most common reasons for SQL queries to suddenly slow down. Queries that were previously running smoothly may suddenly slow down as data volume increases, most likely because statistics haven’t been updated in a timely manner.

1.4 Typical symptoms of outdated statistical information

Phenomenon Possible reasons
The execution plan changed suddenly. The statistics do not reflect the true distribution of data.
Query time changed from milliseconds to seconds. The optimizer selected an incorrect connection order or access path.
The same query can be fast or slow. Unstable statistical information or excessively low sampling rate
II. Constructing a suitable index

2.1 Core Principles of Index Design

Creating the right indexes for tables is just as important as collecting the latest statistics. Indexes are the most direct way to improve query performance, but they need to be designed carefully; more indexes are not necessarily better.

A simple query example:

SELECT LASTNAME, SALARY
FROM EMP
WHERE EMPNO = ‘000010’
AND DEPTNO = ‘D01’;
Get the complete project code with one click
SQL
2.2 Evaluation of Index Candidate Schemes
For this query, various indexes can be created:

Indexing scheme: Index column, applicability analysis
Index1 (EMPNO): It can quickly locate rows that match EMPNO, but DEPTNO still needs to be filtered in the results.
Index2 (DEPTNO): It can quickly locate rows that match DEPTNO, but EMPNO still needs to be filtered.
Index3 (EMPNO, DEPTNO) Optimal: Directly locate rows that satisfy both conditions simultaneously.
Why is Index3 the best choice?

EMPNO=’000010’Index3 allows the DBMS to locate the exact rows that satisfy both equality and equality in a single index lookup DEPTNO=’D01′, without additional filtering. A key detail is that the order of the columns in the index is crucial. In this scenario, EMPNO should be placed first (for equality queries), and DEPTNO second.

2.3 Trade-offs in Index Design
Trade-off 1: Query performance vs. modification performance

The DBMS must automatically maintain each created index:

    After inserting a row → Update all indexes

    Delete one row → Update all indexes

    Update indexed column → Update corresponding index

Therefore, the more indexes there are, the slower the insertion, deletion, and update speeds become. In an OLTP environment, a balance needs to be found between these two factors.

Trade-off 2: Whether to reuse existing indexes

If a single-column index EMPNO and already DEPTNO exist, some DBMSs can use two single-column indexes simultaneously (through Bitmap Index Merge or Index Join) to satisfy the query, without necessarily needing to create a new composite index.

Decision basis: The importance of the query. The queries that the CEO runs daily deserve to have a dedicated, optimal index created; in contrast, the ad-hoc queries of ordinary employees can use the existing index.

Trade-off 3: Index Overloading

If all the data required by the SQL statement is contained in an index, the DBMS can satisfy the request using only the index, without needing to access the table data.

In the previous query, we only searched for LASTNAME, the sum of SALARY, and EMPNO; the sum of DEPTNO was already a query condition:

CREATE INDEX idx_emp_covering ON EMP(EMPNO, DEPTNO, LASTNAME, SALARY);
Get the complete project code with one click
SQL
Now, the DBMS can return all data using only the indexes, without touching the EMP table. The term is called index-only access.

Trying to make every query perform index-only access is neither realistic nor wise. This technique should be reserved for particularly important or frequently executed queries. SQL statement.

2.4 Index Design Quick Reference Table

Scene Recommended strategy Precautions
Equality lookup (=) Place the equality column before the index. Prioritize columns with high discrimination.
Range queries (>, <, BETWEEN) The range column is placed after the equality column. Columns following the range column cannot utilize the index.
ORDER BY The index column order is consistent with ORDER BY. Consider ascending and descending order matching
Tables that are frequently modified control the number of indexes (recommended ≤ 5). Each index slows down write operations.
Important Queries: Consider covering indexes. Balancing storage costs and query performance
III. The Synergistic Effect of Statistical Information and Indexes
Statistics and indexes do not work in isolation—they have a close working relationship:

Scene The role of statistical information The role of index

The optimizer evaluates the cost of index scans. Provides index statistics (leaf page count, level, discrete key values). Provides data structure support
Determine whether to use an index Provides column statistics (data distribution, selectivity). Provide access path
Evaluate connection order. Also Provide table size and row count estimates. Provide connection key index
Best practices for collaborative work:

Update statistics immediately after index creation: This allows the optimizer to evaluate the value of the new index.

Regularly update statistics: ensure the optimizer is aware of the latest data distribution.

Monitor index usage: Delete indexes that are never used to reduce maintenance overhead.

IV. Summary
If you are a beginner in database performance management, please be sure to start with the two core points introduced in this article:

Priority optimization points Core mission Expected returns

First Statistical Information Management Ensure that statistical information is up-to-date and accurate. The optimizer can make the correct execution plan decisions.
second Index Design Create appropriate indexes for important queries. Significantly reduce the amount of data scanned
However, please remember that our discussion of these two areas is only the tip of the iceberg. Statistical information collection strategies (sampling rate, histogram accuracy) and index design methodologies (composite index column order, use cases for covering indexes) are both worthy of in-depth study.

Even for seasoned DBAs, there’s no harm in re-examining these issues. New database versions may introduce features you haven’t used before or reinforce your existing knowledge.

Finally, a piece of advice based on experience:

When you encounter database performance issues, don’t rush to tune the SQL. First, check if the statistics are up-to-date, then confirm if the index design is reasonable. The answers to 80% of performance problems can be found here.

Admin

Techaiprompt is an educational platform focused on technology, artificial intelligence, and practical AI prompts. We create easy-to-understand guides, tutorials, and real-world examples to help beginners and learners build skills with confidence. Our goal is to simplify complex tech and AI concepts into useful, beginner-friendly resources.

Leave a Comment