Over a million developers have joined DZone.

SQL Query Optimization and Normalization

DZone's Guide to

SQL Query Optimization and Normalization

· Big Data Zone
Free Resource

Need to build an application around your data? Learn more about dataflow programming for rapid development and greater creativity. 

This article was written by the Imaginovation team.  They are a Raleigh web design and software development company who uses .NET, PHP, HTML5, JavaScript, and jQuery technologies.

The same data can be retrieved from a database using different SQL queries, but for better performance the user needs to understand data normalization and query optimization. Below are a few concepts to improve overall performance. 

Normalization: Normalizing data means eliminating redundant information from a table and organizing the data so that future changes to the table are easier.

De-normalization: This means allowing redundancy in a table. The main benefit of de-normalization is improved performance with simplified data retrieval and manipulation. This is accomplished by reduction in the number of joins needed for data processing.

What structure can the user implement for the database to speed up table reads?

The user needs to follow the rules of DB tuning:

  1. Properly use indexes (different types of indexes)
  2. Properly locate different DB objects across different table spaces, files and so on
  3. Create a special space (table space) to locate some of the data with special data type (for example CLOB, LOB etc.)

What are the tradeoffs of indexes?

  1. Faster selects, slower updates.
  2. Extra storage space to store indexes. Updates are slower because in addition to updating the table you have to update the index.

Why can a "group by" or "order by" clause be expensive to process?

Processing of "group by" or "order by" clause often requires creation of Temporary tables to process the results of the query, which can be very expensive for large data.

NOT Operator

Always try to avoid NOT operator because positive operators (LIKE, IN, EXIST, =) has better performance than negative operators (NOT LIKE, NOT IN, NOT EXIST, !=).  Negative operators search into each row to identify the data where positive operators stop searching once the result is found.

Use Default Value

While designing the database, assign default value to columns. This helps to improve performance while inserting the data.

SQL Optimization

Logical Operator

Query can be optimized by using proper symbol operator such as >,<,=,!=, etc.

Un-Optimized Query

1.     SELECT * FROM table_name

2.     SELECT * FROM table_name WHERE column_name > 100

Optimized Query

3.     SELECT * From table_name WHERE column_name >=99

4.     SELECT column_name1, column_name2 From table_name WHERE column_name >=99

Query 1 will return all records from database and if database has few billion records then it will kill the database. Query 2 will look for value 100 in database then will perform action on value 100. Query 3 will not look for value 100. Query 4 is more optimized than query 3 because it is fetching only required data.  For Better performance always limit the result using WHERE clause.


Wildcard plays an important role in SQL query performance. It slows down for large tables. Performance can be improved by using postfix instead of pre and full wild card.

Un-Optimized Query

1.     SELECT * FROM table_name WHERE column_name LIKE '%test%';

Optimized Query

2.     SELECT * FROM table_name WHERE column_name LIKE  '%test; (column from where clause should be indexed)

3.     SELECT * FROM table_name WHERE column_name LIKE  test%';

Also try to avoid SUBSTR in SQL query, rather use Wildcard.

Sub Query

We always write sub queries in main query. Most of the time sub queries are very expensive because outer query executes before inner query.

Un-Optimized Query

1.     SELECT * FROM table_name1 WHERE column_name1 IN (SELECT column_name2 FROM table_name2)

Optimized Query

2.     SELECT * FROM table_name1 t1, table_name2 t2 WHERE t1.column_name1 = t2.column_name2;

Union Operator

Union operator runs more faster and has better performance than OR operator

Un-Optimized Query

1.     SELECT * FROM table_name WHERE column_name1 = 'test1' OR column_name2 = 'test2'

Optimized Query

2.     SELECT * FROM table_name WHERE column_name1= 'test1'


SELECT * FROM table_name WHERE column_name2 = 'test2'



GROUP BY gives better performance than using HAVING clause because HAVING clause operates after retrieving data from database.

Un-Optimized Query

1.     SELECT column_name, COUNT(column_name) FROM table_name GROUP BY column_name

HAVING column_name>=100

Optimized Query

2.     SELECT column_name, COUNT(column_name) FROM table_name  WHERE column_name  >=100

GROUP BY column_name


If you want to learn more, or if you're interested in any of Imaginovation's service offerings, which include: Web Design, Mobile Application Development, Search Engine Optimization, please visit our website at www.imaginovation.net, or feel free to give us a call at (888) 723-8643.


Check out the Exaptive data application Studio. Technology agnostic. No glue code. Use what you know and rely on the community for what you don't. Try the community version.


Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}