Performance Improvement in Java Applications: ORM / JPA
Performance Improvement in Java Applications: ORM / JPA
This article recommends practices for when you are developing an application using a JPA provider or directly an ORM framework.
Join the DZone community and get the full member experience.Join For Free
Container Monitoring and Management eBook: Read about the new realities of containerization.
Nowadays the use of ORM frameworks is a common practice when we need to develop applications in the object-oriented programming paradigm due to the following reasons:
The relational world isn't directly compatible to the object world.
The diversity of RDBMS (relational database management systems) introduces a complexity to translate objects to specific SQL commands expected by each RDBMS.
So all we need is to include an ORM framework like Hibernate or EclipseLink and everything will be solved, right? I don't think so. These frameworks create new concerns when we develop that can be described in one word: performance. I worked on projects that had problems with performance and had to be refactored due to the developers feeling that it was not necessary to worry about it because the ORM framework already solved it for them. The result was that the delivered applications ran, but with horrible performance.
To prevent the need of refactoring or surprises in production, I recommend the following practices when you are developing an application using a JPA provider or directly an ORM framework:
Use the Default FetchMode Values in the Associations
The default FetchMode values in JPA are recommended when you don't want loose time evaluating the best strategy to apply, where the relationship types one-to-one or many-to-one (non-collection fields/properties) must be mapped as EAGER and the relationship types one-to-many and many-to-many (collection fields/properties) must be mapped as LAZY.
If you want to optimize the relationship mapping, consider that using the lazy strategy, the ORM executes more than one SQL statement per query and using the eager strategy, the ORM can load more data than you really need. The best solution is to evaluate and balance the disadvantages of each strategy.
Define the Batch Size and Fetch Size
The ORM frameworks provide configurations to define batch size and fetch size. These configurations allow optimizations when executing statements such as SELECT and INSERT.
Considering Hibernate as an example, the following configuration properties can be defined:
hibernate.default_batch_fetch_size: Default size for Hibernate batch fetching of associations. This property is applicable when you have associations configured with fetch mode LAZY or you do not use the JOIN FETCH in your queries.
Example 1: 1 query related to 1 table containing 100 rows and 1 association will produce 100 SQL statements when the batch size was configured with the value 1.
Example 2: 1 query related to 1 table containing 100 rows and 1 association will produce 25 SQL statements when the batch size was configured with the value 4.
hibernate.jdbc.fetch_size: A non-zero value determines the JDBC fetch size, by calling Statement.setFetchSize(). This property is applicable when you want to load more data in memory. The higher the value of this property, the lower the communication with the server and thus better performance. But be careful, a high value will consume more memory in the application and can cause the OutOfMemoryError exception. I consider an value between 10 and 50 reasonable.
hibernate.jdbc.batch_size: A non-zero value causes Hibernate to use JDBC2 batch updates. This property is applicable when you want to execute a batch of INSERT commands. But be careful, I don't recommend using JPQL to execute large amounts of INSERT commands. Use native queries whenever possible in this case to reduce overhead.
Write Queries Using the NEW Operator
When you write a query and only declare the entity in the SELECT clause, the ORM framework will load all columns of this representation of the table plus all the columns of your associations declared with JOINs. Furthermore, all associations that weren't declared using JOINs will be loaded from separated queries and will be considered all the columns in the SELECT clause too.
To reduce the traffic in a network of unnecessary data and reduce the amount of data stored in the application server for each request to the RDBMS, I suggest using the NEW operator. This operator allows us to declare only the fields/columns that we need to load through the constructor of an entity or of a DTO (data transfer object) class.
Consider the entity Foo with two associations one-to-many called bar and baz. This entity has 15 fields mapped to the RDBMS.
If you need to retrieve only 3 fields, you can write the following optimized query:
SELECT NEW Foo(f.field1, f.field2, f.field3) FROM Foo f JOIN f.bar bar WHERE f.field10 = :valueToField10 AND bar.id = :valueToBarId
Write Queries Using JOIN FETCH for the Associations
When you write a query to retrieve some values of fields related to associations that your entity has, I recommend to use the JOIN FETCH statement. The use of this statement reduces the number of queries generated by the ORM framework to achieve the desired result and can even reduce to only one query.
Consider the entity Foo with two associations one-to-many called bar and baz.
If you need to retrieve all associations of the entity Foo in just one query, you can write the following query:
SELECT DISTINCT f FROM Foo f JOIN FETCH f.bar bar JOIN FETCH f.baz baz WHERE f.field10 = :valueToField10 AND bar.id = :valueToBarId
Note: The DISTINCT command was included to eliminate duplications of entities that may appear in the results due to the use of JOIN FETCH to associations represented by collections.
Use Paging Queries
When you execute a query to return a collection of rows as the result without specifying the paging parameters, the ORM framework will retrieve all rows (entities) related to the query and store them in memory, causing an overhead unnecessary if the return wasn't fully used, besides generating a high memory consumption.
To mitigate or eliminate this overhead, use the following methods provided by the interfaces Query or TypedQuery, considering the JPA specification:
setFirstResult(int): Expects one parameter to indicate the offset of the page.
setMaxResults(int): Expects one parameter to indicate the limit of the page.
Thus, it is possible to use these methods to limit the result of the query and reduce the amount of memory used and reduce the network traffic between the application and the RDBMS.
Imagine that you wrote a query related to a table with 1 million rows. If you execute this query without any conditional on the WHERE clause, you going to retrieve 1 million of rows in memory. The consequence will be to generate a large overhead in the application. But if you execute this query in the same conditions with paging parameters, you can control the number of rows returned and stored in memory, which is the best practice in my opinion.
Use the Second Level Cache (L2 / 2LC)
By default, the First Level Cache (L1 / 1LC) is enabled by the JPA specification or the ORM frameworks. This type of cache works per transaction, i.e., the cache is available during each transaction and allows to reduce the accesses to the RDBMS. But if you want to further reduce the number of accesses to the RDBMS, you need to use the Second level Cache (L2 / 2LC). This type of cache is addressed to work across multiple transactions (EntityManagers) considering one or more application servers.
At first, 2LC can be seen as a great deal; however, there are issues behind it that need be considered, as described below:
The objects that are read-only or inserted by the ORM should not be an issue.
The objects that are updated or deleted by the ORM through other applications or servers can cause to these objects become stale, i.e., become inconsistent.
Thus, I suggest applying the 2LC first for to attack issues that aren't impacted by the inconsistency and after the issues that cause inconsistency with a complete evaluation of the involved RDBMS objects that are manipulated by your application.
To enable the 2LC via JPA specification, see the following instructions here.
To enable the 2LC via Hibernate, see the following instructions here.
Note: ORM frameworks are generally more powerful than the JPA specification as the flexibility of the cache.
Use Native Queries Whenever Possible
When you write, mainly, a complex query and your application just need to work in one RDBMS vendor, e.g. PostgreSQL, Oracle Database, I suggest you use a named native query or native query to reduce overhead produced by the JPQL parser. This also can be applied when your application needs to work with multiple RDBMS vendors if the query was written in SQL ANSI.
There are other ways to reduce the overhead without being through the ORM framework, such as: setting up your data source with a cache prepared statement.
Disable SQL Debugging
In production environments, configure to false the value of the property used to display SQL statements on the console. It's nothing new, but I think important to register and not let it be forgotten.
Others Practices Related to Performance
Indexing of table columns on RDBMS to increase the performance.
Analysis of the execution plan to identify how to optimize your queries.
Indexing entity fields using specific tools to make your search faster in the following cases: full-text and geolocation. See Hibernate Search.
ORM frameworks are powerful and greatly reduces our work, but it must be used responsibly about what is being built to not affect the application's performance when it is actually delivered to production.
Opinions expressed by DZone contributors are their own.