Finding and Fixing Spring Data JPA Performance Issues with FusionReactor
Finding and Fixing Spring Data JPA Performance Issues with FusionReactor
With the right tools in place, you can identify performance problems easily and often even before they cause trouble in production.
Join the DZone community and get the full member experience.Join For Free
For several years, Spring Data JPA has established itself as one of the most commonly used persistence frameworks in the Java world. It gets most of its features from the very popular Hibernate object-relational mapping (ORM) implementation. The ORM features provide great developer productivity, and the basic functionality is very easy to learn.
But as so often, you need to know a lot more than just the basic parts if you want to build enterprise applications. Without a good understanding of its internals and some advanced features, you will struggle with severe performance issues. Spring Data’s and Hibernate’s ease of use sometimes makes it way too easy to build a slow application.
But that doesn’t have to be the case. With the right tools in place, you can identify performance problems easily and often even before they cause trouble in production. In this article, I will show you 3 of Hibernate’s and Spring Data’s most common performance pitfalls, how you can find them using FusionReactor’s Java Monitoring or Hibernate’s statistics, and how you can fix them.
Pitfall 1: Lazy Loading Causes Lots of Unexpected Queries
When you learn about Spring Data JPA and Hibernate performance optimizations, you always get told to use FetchType.LAZY for all of your applications. This tells Hibernate to only load the associated entities when you access the association. That’s, of course, a much better approach than using FetchType.EAGER, which always fetches all associated entities, even if you don’t use them.
Unfortunately, FetchType.LAZY introduces its own performance issue if you use a lazily fetched association. Hibernate then needs to execute an SQL query to get the associated entities from the database. This becomes an issue if you work with a list of entities, as I do in the following code snippet.
The findAll method of Spring Data’s JpaRepository executes a simple JPQL query that gets all Concert entities from the database and returns them as a List. Each of these concerts is played by band. If you set the FetchType of that association to FetchType.LAZY, Hibernate executes a SQL query to fetch the Band when you call the getter method on the Concert entity. If you do that for each Concert entity in the List, Hibernate will execute an SQL query for each Band who plays a concert. Depending on the size of that List, this will cause performance problems.
Find Unexpected Queries
This issue is relatively hard to find in your code. But it gets pretty easy if you monitor the queries executed by your application.
Using FusionReactor, you can easily see all the SQL statements, which were performed by the getConcerts method. Based on the code, you would probably expect that Hibernate only performs 1 SELECT statement. But as you can see in the screenshot, Hibernate executed 10 SELECT statements because it had to get the associated Band entity for each Concert.
Or you can activate Hibernate’s statistics component and the logging of SQL statements. Hibernate then writes a log message at the end of each session, which includes the number of executed JDBC statements and the overall time spent on these operations.
Avoid Additional Queries
You can avoid this issue by using a JOIN FETCH clause that tells Hibernate to fetch the Concert and associated Band entities within the same query. You can do that by adding a method to your repository interface and defining a custom query using the @Query annotation.
Instead of 10 queries, Hibernate now gets all information with only 1 query.
Pitfall 2: Slow Database Queries
Slow queries are a common issue in all applications that store their data in a relational database. That’s why all databases provide an extensive set of tools to analyze and improve these queries.
Even though we can’t blame Spring Data JPA or Hibernate for these issues, we still need to find and fix these queries in our application. And that’s often not as easy as it might seem. Hibernate generates the executed SQL statements based on our JPQL queries. In general, the executed queries are efficient. But sometimes, the additional abstraction of JPQL hides performance problems that would be obvious, if we would write the SQL query ourselves.
The following JPQL query, for example, looks totally fine. We’re loading Concert entities and use multiple JOIN FETCH clauses.
Find Inefficient Queries
The problem becomes obvious if you activate the logging of SQL statements in Hibernate or take a look at the executed JDBC statements in FusionReactor.
Hibernate has to select all columns mapped by an entity if you reference it in your SELECT clause or if you tell Hibernate to JOIN FETCH an association. In this case, the JPQL query that referenced 3 entities caused an SQL statement that selects 22 columns. These are a lot more columns than you might expect when you look at the JPQL query, and it gets worse if your entities map more columns or you JOIN FETCH more associations.
The JOIN FETCH clause creates another issue: The result set contains the product of all joined records. Due to that, such result sets often contain thousands of records.
Improve Inefficient Queries
The only way to fix this performance problem is to avoid these kinds of queries. You could try to use a smaller, use case-specific projection. Or you could split your query into multiple ones, e.g., one that fetches the Band entity with a JOIN FETCH clause for the artist attribute and another query for the Concert entity.
Pitfall 3: Too many write operations
Another common performance pitfall is the inefficient handling of write operations for multiple entities.
Let’s say you need to reschedule all concerts that were planned for the month of April. Using Java and Hibernate as your ORM framework, it feels natural to get a Concert entity object for each of these concerts and to change the eventDateTime attribute.
Find Inefficient Write Operations
But that would force Hibernate to execute an SQL UPDATE statement for each concert. Similar to the previous performance issues, this inefficiency is only visible, if you monitor the executed SQL statements.
Reduce the Number of Write Operations
In SQL, you would write one SQL UPDATE statement that changes the value in the event_date_time column of all concerts that are scheduled for the month of April. That’s obviously the more efficient approach.
You can do the same with a native query in Hibernate. But before you do that, you should always call the flush() and clear() methods on your EntityManager. That ensures that your 1st level cache doesn’t contain any local copy of the data that your query will change.
As you have seen, Hibernate is easy to use, but it can also cause some unexpected performance problems. These are often hard to find in your code but very easy to see as soon as you monitor the executed SQL statements. If you use the right logging configuration, you can find these statements in your application log file. Or you can use FusionReactor’s Database Monitoring features and integrate these checks in your application monitoring strategy.
Published at DZone with permission of Thorben Janssen , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.