Fetching Data With ORMs Is Easy! Is It?

We explore how one ORM system makes querying and fetching large amounts of data much easier.

Andrey Belyaev

CORE ·

Jun. 06, 19 · Tutorial

Likes (6)

Comment

Save

6.5K Views

Image title

Introduction

Almost any system operates with external data stores in some way. In most cases, it is a relational database and very often data fetching is delegated to some ORM implementations. ORM covers a lot of routine operations and brings along a few new abstractions in return.

Martin Fowler wrote an interesting article about ORM and one of the key thoughts there is “ORMs help us deal with a very real problem for most enterprise applications... They aren't pretty tools, but then the problem they tackle isn't exactly cuddly either. I think they deserve a little more respect and a lot more understanding.”

In the CUBA framework, we use ORM very heavily and know a lot about its limitations since we have various kinds of projects all over the world. There are a lot of things that can be discussed, but we will focus on one of them: lazy vs. eager data fetching. We’ll talk about different approaches to data fetching (mostly within the JPA API and Spring), how we deal with it in CUBA, and what RnD work we do to improve the ORM layer in CUBA. We will have a look at essentials that might help developers to not hit issues with terrible performance using ORMs.

Fetching Data: The Lazy Way or the Eager Way?

If your data model contains only one entity, there will be no issues with using ORM. Let’s have a look at the example. We have a user who has ID and Name:

public class User {
   @Id
   @GeneratedValue
   private int id;
   private String name;

   //Getters and Setters here
}

To fetch it we just need to ask EntityManager nicely:

EntityManager em = entityManagerFactory.createEntityManager();
User user = em.find(User.class, id);

Things get interesting when we have one-to-many relation between entities:

public class User {
   @Id
   @GeneratedValue
   private int id;
   private String name;
   @OneToMany
   private List<Address> addresses;

   //Getters and Setters here
}

If we want to fetch a user record from the database, a question arises: “Should we fetch an address too?” And the “right” answer will be: “It depends.” In some use cases, we may need an address; in some of them, we may not. Usually, an ORM provides two options for fetching data: lazy and eager. Most of them set the lazy fetch mode by default. And when we write the following code:

EntityManager em = entityManagerFactory.createEntityManager();
User user = em.find(User.class, 1);
em.close();
System.out.println(user.getAddresses().get(0));

We get so-called “LazyInitException” which really confuses ORM rookies. And here we need to explain the concept on “Attached” and “Detached” objects as well as tell about database sessions and transactions.

Okay then, an entity instance should be attached to a session so we should be able to fetch detail attributes. In this case, we got another problem — transactions are getting longer, therefore, the risk of getting a deadlock increases. And splitting our code to a chain of short transactions may cause “death by a million mosquitos” for the database due to an increased number of very short, separate queries.

As we said, you may or may not need the Addresses attribute fetched, therefore you need to “touch” the collection only in some use cases, adding more conditions. Hmmmm… Looks like it’s getting complex.

Okay, will another fetch type help?

public class User {
   @Id
   @GeneratedValue
   private int id;
   private String name;
   @OneToMany(fetch = FetchType.EAGER)
   private List<Address> addresses;

   //Getters and Setters here
}

Well, not exactly. We’ll get rid of the annoying lazy init exception and should not check whether an instance is attached or detached. But we got a performance problem, because, again, we don’t need Addresses for all cases, but select them always. Any other ideas?

Spring JDBC

Some developers become so annoyed with ORM that they switch to “semi-automatic” mappings using Spring JDBC. In this case, we create unique queries for unique use cases and return objects that contain attributes valid for a particular use case only.

It gives us great flexibility. We can get only one attribute:

String name = this.jdbcTemplate.queryForObject(
       "select name from t_user where id = ?",
       new Object[]{1L}, String.class);

Or the whole object:

User user = this.jdbcTemplate.queryForObject(
       "select id, name from t_user where id = ?",
       new Object[]{1L},
       new RowMapper<User>() {
           public User mapRow(ResultSet rs, int rowNum) throws SQLException {
               User user = new User();
               user.setName(rs.getString("name"));
               user.setId(rs.getInt("id"));
               return user;
           }
       });

You can fetch addresses too using ResultSetExtractor, but it involves writing some extra code and you should know how to write SQL joins to avoid the n+1 select problem.

Well, it’s getting complex again. You control all the queries and you control mapping, but you have to write more code, learn SQL, and know how database queries are executed. Though I think knowing SQL basics is a necessary skill for almost every developer, some of them do not think so and I’m not going to argue with them. Knowing x86 assembler is not a vital skill for everyone nowadays either. Let’s just think about how we can simplify development.

JPA EntityGraph

Let’s take a step back and try to understand what we’re going to achieve. It seems like all we need to do is to tell exactly which attributes we’re going to fetch in different use cases. Let’s do it then! JPA 2.1 has introduced a new API, Entity Graph. The idea behind this API is simple — you just write several annotations that describe what should be fetched. Let’s have a look at the example:

@Entity
@NamedEntityGraphs({
       @NamedEntityGraph(name = "user-only-entity-graph"),
       @NamedEntityGraph(name = "user-addresses-entity-graph",
               attributeNodes = {@NamedAttributeNode("addresses")})
       })
public class User {
   @Id
   @GeneratedValue
   private int id;
   private String name;
   @OneToMany(fetch = FetchType.LAZY)
   private Set<Address> addresses;

   //Getters and Setters here

}

For this entity, we’ve described two entity graphs — the user-only-entity-graph does not fetch the Addresses attribute (which is marked as lazy), whilst the second graph instructs the ORM to select addresses. If we mark an attribute as eager, entity graph settings will be ignored and the attribute will be fetched.

So, starting from JPA 2.1 you can select entities in the following way:

EntityManager em = entityManagerFactory.createEntityManager();
EntityGraph graph = em.getEntityGraph("user-addresses-entity-graph");
Map<String, Object> properties = Map.of("javax.persistence.fetchgraph", graph);
User user = em.find(User.class, 1, properties);
em.close();

This approach greatly simplifies a developer’s work, there is no need to “touch” lazy attributes and create long transactions. The great thing is that the entity graph can be applied at the SQL generation level, so no extra data is fetched to Java application from the database. But there is a problem still. We cannot say which attributes were fetched and which weren’t. There is an API for this, you can check attributes using PersistenceUnit class:

PersistenceUtil pu = entityManagerFactory.getPersistenceUnitUtil();
System.out.println("User.addresses loaded: " + pu.isLoaded(user, "addresses"));

But it is pretty boring. Can we simplify it and just do not show unfetched attributes?

Spring Projections

Spring Framework provides a fantastic facility called Projections (and it’s different from Hibernate’s Projections). If we want to fetch only some properties of an entity, we can specify an interface and Spring will select interface “instances” from a database. Let’s have a look at the example. If we define the following interface:

interface NamesOnly {
   String getName();
}

And then define a Spring JPA repository to fetch our User entities:

interface UserRepository extends CrudRepository<User, Integer> {
   Collection<NamesOnly> findByName(String lastname);
}

In this case, after the invocation of the findByName method, we just won’t be able to access unfetched attributes! The same principle applies to detail entity classes too. So you can fetch both master and detail records this way. Moreover, in most cases Spring generates “proper” SQL and fetches only attributes specified in the projection, i.e. projections work like entity graph descriptions.

It is a very powerful concept, you can use SpEL expressions, use classes instead of interfaces, etc. There is more information in the documentation you can check it if you’re interested.

The only problem with Projections is that under the hood they are implemented as maps, and hence are read-only. Therefore, though you can define a setter method for a projection, you will be able to save changes using neither CRUD repositories nor EntityManager. You can treat projections as DTOs, and you have to write your own DTO-to-entity conversion code.

CUBA Implementation

From the beginning of CUBA framework development, we tried to optimize the code that works with a database. In the framework, we use EclipseLink to implement data access layer API. The good thing about EclipseLink is that it supported partial entity load from the beginning, that’s why we chose it over Hibernate in the first place. In this ORM, you could specify which exact attributes should be loaded before JPA 2.1 had become a standard. Therefore we added our internal “Entity Graph” — like concept to our framework — CUBA Views. Views are pretty powerful — you can extend them, combine, etc. The second reason behind CUBA Views creation is that we wanted to use short transactions, and focus on working mostly with detached objects, otherwise, we could not make rich web UI fast and responsive.

In CUBA, view descriptions are stored in an XML file and look like this:

<view class="com.sample.User"
     extends="_local"
     name="user-minimal-view">
   <property name="name"/>
   <property name="addresses"
             view="address-street-only-view"/>
</view>

This view instructs CUBA DataManager to fetch the User entity with its local name attribute and fetch addresses applying address-street-only-view while fetching them (important!) at the query level. When a view is defined you could apply it to get entities using the DataManager class:

List<User> users = dataManager.load(User.class).view("user-edit-view").list();

It works like a charm, and saves a lot of network traffic on not loading unused attributes, but, like in JPA Entity Graph, there is a small issue: we cannot say which attributes of the User entity were loaded. And in CUBA we have the annoying “IllegalStateException: Cannot get unfetched attribute [...] from a detached object” error. Like in JPA, you can check whether an attribute is unfetched, but writing these checks for every entity being fetched is a boring job and developers are not happy with it.

CUBA View Interfaces PoC

And what if we could get the best of both worlds? We decided to implement so-called entity interfaces that utilize Spring’s approach, but those interfaces are translated into CUBA views during application startup and then can be used in DataManager. The idea is pretty simple: you define an interface (or a set of interfaces) that specify the entity graph. It looks like Spring Projections and works like Entity Graph:

interface UserMinimalView extends BaseEntityView<User, Integer> {
   String getName();
   void setName(String val);
   List<AddressStreetOnly> getAddresses();

   interface AddressStreetOnly extends BaseEntityView<Address, Integer> {
      String getStreet();
      void setStreet(String street);
   }
}

Note that the AddressStreetOnly interface can be nested if it is used only in one case.

During CUBA Application startup (in fact, it is mostly Spring Context Initialization), we create a programmatic representation for CUBA views and store them in an internal repository bean in Spring context.

After that, we need to tweak the DataManager, so it can accept class names in addition to CUBA View string names and then we simply pass interface class:

List<User> users = dataManager.loadWithView(UserMinimalView.class).list();

We generate proxies implementing an entity view for each instance fetched from the database as Hibernate does. And when you try to get an attribute’s value, the proxy forwards the invocation to the real entity.

With this implementation we’re trying to kill two birds with one stone:

The data that is not stated in the interface is not loaded to the Java application code, thus saving server resources.
A developer uses only properties that were fetched, therefore, no more “UnfetchedAttribute” errors (a.k.a. LazyInitException in Hibernate).
In contrast to Spring Projections, Entity Views wrap entities and implement CUBA’s Entity interface, therefore they can be treated as entities: you can update the property and save changes to the database.

The “third bird” here — you can define a “read-only” interface that contains only getters, completely preventing entities from modifications at the API level.

Also, we can implement some operations on the detached entity like this user’s name conversion to lowercase:

@MetaProperty
default String getNameLowercase() {
   return getName().toLowerCase();
}

In this case, all calculated attributes can be moved from the entity model, so you don’t mix data fetch logic with use case-specific business logic.

Another interesting opportunity is that you can inherit interfaces. This gives you the possibility to prepare several views with a different set of attributes and then mix them if needed. For example, you can have one interface that contains the user’s name and email and another one that contains name and addresses. And if you need a third view interface that should contain a name, email, and addresses, you can do it just by combining both — thanks to the multiple inheritance of interfaces functionality in Java. Please note that you can pass this third interface to methods that consume either first or second interface, OOP principles work here as usual.

We’ve also implemented entity conversion between views — each entity view has a reload() method that accepts another view class as a parameter:

UserFullView userFull = userMinimal.reload(UserFullView.class);

UserFullView may contain additional attributes, so the entity will be reloaded from the database. And entity reload is a lazy process, it will be performed only when you try to get an entity attribute value. We did this on purpose because in CUBA we have a “web” module that renders rich UI and may contain custom REST controllers. In this module, we use the same entities, and it can be deployed on a separate server. Therefore, each entity reload causes an additional request to the database via the core module (a.k.a middleware). So, by introducing lazy entity reload we save some network traffic and database queries.

The PoC can be downloaded from GitHub — feel free to play with it.

Conclusion

ORMs are going to be massively used in enterprise applications in the near future. We just have to provide something that will convert database rows into Java objects. Of course in complex, high-load applications we’ll continue seeing unique solutions, but ORM will live as long as RDBMSes will.

In CUBA framework we’re trying to simplify ORM use to make it as painless for developers as possible. And in the next versions, we’re going to introduce more changes. I’m not sure whether those will be view interfaces or something else, but I’m pretty sure with one thing — working with ORM in the next version with CUBA will be simplified.

Database Relational database Spring Framework Data (computing) Attribute (computing) Interface (computing) Fetch (FTP client)

Published at DZone with permission of Andrey Belyaev. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending