Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Boosting Spring Data Performance With Couchbase

DZone 's Guide to

Boosting Spring Data Performance With Couchbase

Give Spring Data a boost!

· Performance Zone ·
Free Resource

Spring Data provides an easy programming model for data access in both relational and non-relational databases. It became very popular among Java/JVM developers because of the small learning curve and low codebase.

However, developers commonly run into performance issues while using it; this tutorial aims to explain some of the common problems and how to fix them.

Avoiding Joins With Couchbase

The most common issue with any object-relational mapping framework (ORM) is the excessive amount of joins needed to bring an object back to memory. The relational model requires you to map relationships between tables and define them as EAGER or LAZY. The problem is that developers quite often map relationships as EAGER, and as the system and data grows, the overall performance starts to decrease rapidly.

It generally forces developers to revisit these entities and set most of the relationships as LAZY. However, this small change requires a significant refactor, as you have not to also change all the code that handles these new lazy entities.

One of the advantages of using Couchbase with Spring Data is that in a document database, we usually store multiple related entities in a single document. Therefore, most of the time, you don’t need to create any JOINs to bring an entity back to memory. This will make your application naturally perform better while compared to the performance in an RDBMS. 

How to Create Indexes for Spring Data

The second key thing to have a good performance at scale is to create the right indexes, and although Couchbase with Spring Data behaves very similar to a relational database at first glance, it works completely different under the hood. For instance, there is no concept similar to a table; all documents are analogous from the database point-of-view.

Therefore, if you have +100 000 documents in your bucket, and you execute a query like the following:

SELECT * from user_profile 
where firstName = "John" 
    and _class="com.cb.demo.userProfile.model.UserProfile"


It will scan all +100 000 documents looking for those two attributes:

Image title

The total number of documents scanned during the query:

Image title

The obvious question then is how do I optimize queries in order to reduce the number of documents scanned? Well, the short answer is: just create the right indexes.

Creating indexes for a Spring Data application is not much different than creating indexes for any other application that uses Couchbase. There are, however, some small details that you should be aware of.

Couchbase’s implementation of Spring Data automatically adds the  _class attribute into your document. This property works like the document type. Therefore, a query like the following:

List<UserEntity> findByName(String name);


Will be translated in N1QL by something like:

Select * from user_profile where name = ? 
and _class= “com.cb.demo.userProfile.model.UserEntity”


TIP: You can print in the console all N1QL queries generated by Spring.  (https://blog.couchbase.com/how-to-log-queries-generated-by-spring-data/)

Note that the expression  _class= “com.cb.demo.userProfile.model.UserEntity” was added automatically to the query. As we always have to specify this filter while querying for the UserEntity, we should create at least one index for it:

CREATE INDEX `userEntity` ON `user_profile`(`_class`) 
  WHERE (`_class` = "com.cb.demo.userProfile.model.UserEntity")


This is the most basic optimization for a Spring Data app: create at least one index for each document type.

With our new index, the databases will only scan documents with  _class="com.cb.demo.userProfile.model.UserEntity" now, which is still better than scanning the whole database.

This basic index can be automatically generated via the  @N1qlSecondaryIndexed annotation:


@N1qlSecondaryIndexed(indexName = "userEntity")  
public interface UserEntityRepository extends CouchbaseRepository<UserEntity, String> {    
}


Tip: the @N1qlSecondaryIndexed will create a basic index for the entity if it does not exist yet. However, if the node containing the index fails and you restart your application, the index will be recreated, which might not be the desired behavior. 

The index we just created is a good start, but it is far from optimal, as most of the time, in an application, we won’t simply query all users and all documents in this database are of the  com.cb.demo.userProfile.model.UserEntity type.

Let’s say we only care about active users, and most of the time, we sort them by the first name. The Spring Data method would look like the following:

List<UserEntity> findByEnabledAndCountryCodeOrderByName(boolean enabled, String countryCode);


And the N1QL query generated will look like the following:

SELECT * from user_profile 
where _class = "com.cb.demo.userProfile.model.UserEntity"
and enabled = true 
    and countryCode = "US" 
    order by firstName


Running this query against our 100k documents, the database will generate the following plan:

Image title

Analyzing the filter part of our query, we will notice that Couchbase has scanned all 100k documents but only 40001 were selected:

Image title

If our application commonly searches for users using these same three filters, we could create an index for it:

create index us_users on user_profile (_class, enabled, countryCode ) 
     where _class = "com.cb.demo.userProfile.model.UserEntity"
     and enabled = true and countryCode = "US" 
     USING GSI;


Now, even queries like the following will leverage the us_users index:

Example 1

List<UserEntiry> findByEnabledAndCountryCodeAndFirstNameLikeOrderByFirstName(…)


Equivalent N1QL Query:

select META(`user_profile `).id AS _ID, 
       META(`user_profile `).cas AS _CAS, 
       `user_profile`.* from user_profile
    where lastName like "Smith%" 
    and _class = "com.cb.demo.userProfile.model.UserEntity"
    and enabled = true 
    and countryCode = "US" 
    order by firstName limit 20 offset 0


Example 2

Spring Data Method:

List<UserEntiry> findByEnabledAndCountryCodeAndFirstNameLikeOrderByFirstName(…)


Equivalent N1QL Query

select META(`user_profile `).id AS _ID, 
META(`user_profile `).cas AS _CAS, 
        `user_profile`.*  from user_profile
where firstName like "John%" 
    and _class = "com.cb.demo.userProfile.model.UserEntity"
and enabled = true 
    and countryCode = "US" 
    order by firstName desc limit 20 offset 0


Image title

Here is a closer look at our new Query Plan using the us_users index:

Image title

The query we executed took 1.67 seconds to run, which is clearly not good enough. If you pay attention to the image above, you will notice that 51 percent of the time was spent during fetch operation, as the filters firstName/lastName  are not in our  us_users index. Let’s add on top of that the fact that we are sorting all results in order to return only the first 20, and then, you have a nice recipe for poor performance.

To fix that problem, we will slightly modify our  us_users index by pushing firstName and  lastName to the index and keep them sorted:

CREATE INDEX `us_users_sorted` ON
`user_profile`(
    `_class`,
    `enabled`,
    `countryCode`,
    `firstName` DESC,
    `lastName` DESC) 
WHERE (((`_class` = "com.cb.demo.userProfile.model.UserEntity") 
    and (`enabled` = true)) 
    and (`countryCode` = "US"))CREATE INDEX `us_users_sorted` ON `user_profile`
(`_class`,
         `enabled`,`countryCode`,`firstName` DESC,`lastName` DESC) WHERE (((`_class` = "com.cb.demo.userProfile.model.UserEntity") and (`enabled` = true)) and (`countryCode` = "US"))


And then, if we run our query again:

Image title

The same query runs now in incredible 4.59 ms, just +360x faster than the previous one. This is a good result considering that we are running the database locally in a commodity notebook.

Let’s run our equivalent Spring Data method just to be sure the time we got is consistent with the code:

Instant start = Instant.now();
List<UserEntity> users =  userEntityRepository
        .findActiveUsersByFirstName("Some%", true, "US", 20, 0);
Instant finish = Instant.now();

System.out.println("Total time: "+ Duration.between(start, finish).toMillis());
System.out.println("Number os users returned = "+users.size() );Instant start = Instant.now();  
List<UserEntity> users =  userEntityRepository.findActiveUsersByFirstName("Some%", true, "US", 20, 0);  Instant finish = Instant.now();    System.out.println("Total time: "+ Duration.between(start, finish).toMillis());  System.out.println("Number os users returned = "+users.size() );


And here is the output:

Image title

The code took 114ms to run, which means that about 90 percent of the time was spent in the application side (preparing the query, converting the results to Java objects) and most importantly, network latency.

Increasing the Data Size

Even though we have +100k users, our index of US users has just 40k documents in it, which might not be fair production scenario yet. Let’s increase the number of US users to 1 million:

Image title

If we run our query again, we will get nearly the same execution time:

Image title

You can also use indexes to boost your JOINs, Group By, and COUNTs. If you need to paginate and navigate through hundreds of results, there also some tricks to make your OFFSET pagination faster. However, these topics are out of the scope of this tutorial.

Reducing Fetches With Projections

Let’s rerun our latest query but returning the top 100 results this time:

Image title

In the Query Plan above, nearly 45 percent of the time was spent in a step called fetch, which is triggered whenever the query filters or attributes being returned are not present in the index.

One of the issues with any Spring Data implementation is that as it doesn’t know which fields you will need, so all fields are returned by default. In Couchbase’s implementation, we specifically return the following: 

SELECT META(`my_bucket`).id AS _ID, 
 META(`my_bucket `).cas AS _CAS, 
        `my_bucket`.* FROM ` my_bucket ` where …


We could avoid fetches by returning/querying only fields that are in the index:

Image title

In the query plan above, there is no fetch step, since all filters and returned fields are in an index called  us_users_sorted, which is basically the main reason why projections are usually faster than the standard Spring Data syntax. Therefore, if you are trying to improve the performance of a query, this is one of the changes you should consider.

Here is how the code of a simple projection looks like:

public List<SimpleUserVO> listActiveUsers( String firstName, boolean enabled, String countryCode,  Integer limit, Integer offset ) {
    String query = "Select meta().id as id, username, tenantId, firstName, lastname from "
            + bucket.bucketManager().info().name()
            + " where type = '"+UserEntity.TYPE+"'"
            + " and firstName like '"+firstName+"%' "
            + " and enabled = "+enabled+" " 
            + " and countryCode = '"+countryCode+"'"
            + " order by firstName desc limit "+limit+ " offset "+offset;

    N1qlParams params = N1qlParams.build().consistency(ScanConsistency.REQUEST_PLUS).adhoc(false);
    ParameterizedN1qlQuery queryParam = N1qlQuery.parameterized(query, JsonObject.create(), params);

    return userRepository.getCouchbaseOperations()
                .findByN1QLProjection(queryParam, SimpleUserVO.class)
}


Let’s check if the code also runs faster:

Instant start = Instant.now();

//old query
List<UserEntity> users =  userEntityRepository
        .findActiveUsersByFirstName("Some%", true, "US", 100, 0);
Instant finish = Instant.now();

System.out.println("Total time: "+ Duration.between(start, finish).toMillis());
System.out.println("Number os users returned = "+users.size() );


Instant start2 = Instant.now();

//query with projections
List<SimpleUserVO> simpleUsers =  userService
        .listActiveUsers("Some%", true, "US", 100, 0);
Instant finish2 = Instant.now();

System.out.println("Total time: "+ Duration.between(start2, finish2).toMillis());
System.out.println("Number os users returned = "+simpleUsers.size() );


Output:

Image title

The code using projections is approximately 50ms faster because all the data needed is already in the index, and there is also fewer data to be transmitted over the network and parsed to Java objects.

One important thing to note: You don’t need to create indexes for every single query; the Query Planner is smart enough to combine and use multiple indexes even when the query has no exact index match.

Note that you can return Value Objects (VOs) directly from Spring Data, but the underlying generated query will still be a  SELECT *:

List<SimpleUserVO> findByName(String name);


TL/DR

In summary, if you are not having a satisfactory performance, we recommend two basic optimizations:

  • Check the generated query and make sure that it is using an index (via Query Planner or *EXPLAIN*)

  • Check if you can create a more optimized index for your query. Sorting and Array Search, for instance, are common scenarios where you might need to create a proper index.

  • In scenarios where you need a high read throughput, choose projections over the standard Spring Data syntax to avoid as much  fetch as possible.

Quite often, we also see scenarios where developers blame the database but turn out to be a problem with the networking or lack of memory/CPU in the application’s machine. For those cases, we highly recommend you to troubleshoot it first using Response Time Observability (RTO) before trying to optimize anything in the database.

Topics:
java ,nosql ,spring ,performance ,couchbase ,spring data ,database ,indexes ,tutorial

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}