7 Database Optimization Hacks for Web Developers
Optimizing your database comes with great rewards. Higher performance and increased query efficiency are a few examples of these benefits.
Join the DZone community and get the full member experience.
Join For FreeOptimizing your database comes with great rewards. Higher performance and increased query efficiency are just a few examples of these benefits.
However, the means aren’t always straightforward and may require changing the rules altogether within a developer team. Furthermore, the examples listed here might not work for your database, based on the system you use. In that case, try to follow the core principle and translate the action into the means your system allows.
Usually, the Database Management System (DBMS), such as the popular MySQL, comes with your website’s hosting plan. However, you can get dedicated database hosting where the server’s resources aren’t shared with your website.
Here are seven tips and hacks you can use to optimize your database.
1. Eliminate Unused Tables
Usually, when you remove or deactivate a plugin, all the database tables remain. This can be a good thing, as you retain all the user information, preferences, and other data. However, in most cases, you’re left with a cumbersome dataset that may bring your server’s performance down.
If you’re using WordPress, you can get rid of leftover tables by installing a plugin called Plugins Garbage Collector. It scans your database for any unused tables, and you can delete the ones that you know aren’t needed.
A more hands-on approach is finding inactive tables using the UPDATE_TIME
string.
Source: StackOverflow.
However, some plugins might not update the table once they access the dataset. Make sure that the database table you’re deleting isn’t in active use.
As a precaution, always create a backup of your database before making any changes. Some managed hosting plans include automatic server backups, but you may need to create a database backup yourself with Microsoft or Linux servers.
2. Create an Execution Plan
The execution plan’s main goal is to display the various methods of retrieving data by containing the operation's type and order when creating a query. If you’re not familiar with execution plans, here’s a video going through the basics.
The typical execution plan includes the following:
Types of operations
Order of operations
Indexes to use
Row count estimates from stats
Row count actual from results
Here’s a graphical example of an ApexSQL execution plan:
Source: ApexSQL.
When you have a proper execution plan in place, you can build useful indexes (which also helps with the following tip) and further optimize your database.
3. Proper Indexing
Indexing allows for faster access to the database and speeds up the queries. If you don’t use indexes at all, then processing the queries becomes painfully slow. Yet, over-indexing your database is also ineffective.
Unfortunately, there isn’t a golden rule for optimizing your database the right way. However, with some trial and error, your database benefits from an index system.
There are two types of database indexes: clustered and non-clustered indexes.
Source: DataSchool.
The clustered indexes use primary keys to organize the data within the table. This index is automatically created when the primary key is defined.
Source: DataSchool.
The non-clustered index’s primary purpose is speeding up the queries by creating columns that are more effortlessly searchable.
Source: DataSchool.
4. Avoid Temporary Tables by Using Index Access
According to MySQL documentation, one of the conditions that create a temporary table is:
Evaluation of statements that contain an ORDER BY clause and a different GROUP BY clause, or for which the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue.
However, you can avoid creating a temporary table when using the ORDER BY
clause by using index access. One of the preconditions for using indexes this way is that all GROUP BY
columns must reference attributes from the same index. Furthermore, the index has to store its keys in order.
There are two types of index access you can use in MySQL: Loose Index Scan and Tight Index Scan.
The Loose Index Scan only considers a fraction of the keys in an index and doesn’t satisfy every WHERE
condition in the query.
If the WHERE
clause contains range predicates, a Loose Index Scan looks up each group’s first key that satisfies the range conditions and again reads the smallest possible number of keys.
There are also several conditions for the table that predetermines the use of a Loose Index Scan. When the Loose Index Scan is possible for a query, the EXPLAIN
output indicates Using index for group-by in the Extra column.
As an example, here’s a list of queries where you can use Loose Index Scan access:
Example index: idx(c1,c2,c3) on table t1(c1,c2,c3,c4)
Source: MySQL.
You can use the Tight Index Scan if your table’s conditions don’t favor using the Loose Index Scan. The Tight Index Scan can be a full or a range index scan, depending on your query needs.
The basis of this index access is that the GROUP BY
clause happens after all keys in the range conditions are found. Thus, the database doesn’t generate a temporary table to satisfy the query.
For example, the below queries won’t work with the Loose Index Scan, but do work with the Tight Index Scan.
Example index - idx(c1,c2,c3) on table t1(c1,c2,c3,c4)
Source: MySQL.
5. Avoid Coding Loops
A SQL query that runs more than once is inefficient and can cause unnecessary performance issues, which pile up quickly, especially with large datasets. In essence, moving a query outside a loop with the goal of it executing only once is the way to go.
There are a couple of nifty solutions to achieve this.
Use JOIN
and GROUP BY
to select data from multiple tables and have the database perform the counting with a single query. This is especially effective for multiple queries, including COUNT
and MAX
clauses.
Source: CodeUtopia.
You can also use subselects, which are SELECT
clauses within SELECT
clauses. This also works great for consolidating queries, which in turn require fewer resources to execute.
Source: CodeUtopia.
6. Get Rid of Correlated Subqueries
Correlated subqueries are essentially coding loops. The subquery runs row-by-row until it satisfies the parent statement. This method of processing is useful when the outcome relies on multi-part answer validation.
Source: GeeksforGeeks.
You can avoid the correlated subqueries by using JOIN
clauses, which makes the query run more efficiently. Essentially, this method replaces WHERE
and removes the necessity to execute the subquery for each row separately.
Here’s an example of how it works:
Source: Ubitsoft.
7. Avoid * Queries
The ultimate goal of every query is to retrieve relevant data for maximum efficiency. However, it’s relatively common to use SELECT *
clauses when creating a query resulting in unnecessary data. While it doesn’t play a massive role in the performance of small datasets, it can leave a significant dent in larger ones.
Selecting the data sparingly helps with optimizing the query speed and reduces resource usage.
A quick way to turn this issue around is using the LIMIT
clause instead of SELECT *
. This way, you limit the output of the query results unless you do need the entire dataset retrieved by the query.
Here’s the LIMIT
clause in action:
Source: TechontheNet.
Conclusion
Optimizing your database may not seem very straightforward, but a bit of trial and error can lead to higher performance and faster queries as a result.
However, before experimenting, always create a backup of your database in case you need to revert.
Opinions expressed by DZone contributors are their own.
Comments