How to Store Text in PostgreSQL: Tips, Tricks, and Traps

In this article, we will review various options to store long text in the PostgreSQL database: @Lob attributes, TEXT, and long VARCHAR table columns. Also, we'll have a look at the difference between Hibernate 5 and 6 in storing long text data.

Andrey Belyaev

CORE ·

Jul. 07, 22 · Tutorial

Likes (9)

Comment

Save

21.3K Views

DDL generation based on JPA entities definition is a daily task for many developers. In most cases, we use tools like Hibernate's built-in generator or JPA Buddy plugin. They make the job easier, but there are exceptions. When it comes to storing big chunks of data in the database, things get a bit complicated.

Use Case: Storing Documents

Let’s assume we need to store a document object with its content in the PostgreSQL database. The JPA entity code for this might look like the code below:

     Java 
   
 
 
   @Entity  
@Table(name = "document")  
public class Document {  
   @Id  
   @GeneratedValue(strategy = GenerationType.IDENTITY)  
   @Column(name = "id", nullable = false)  
   private Long id;  
 
   @Column(name = "date_created", nullable = false)  
   private LocalDateTime dateCreated;  
 
   @Column(name = "doc_txt")  
   private String docText;  
   
   //Getters and setters omitted for brevity  
}   
  

The question is: what if we need to store really long document text? In Java, the string datatype can hold about 2Gb of text data, but the table column size will be limited to 255 characters by default for the model above. So, what should we change?

Option 1: Use LOB Storage

In relational databases, a particular data type exists to store big amounts of data: LOB (Large OBject). Once we need to store large text in the database, we can start with defining a LOB column. All we need to do is mark the docText attribute with the @Lob annotation.

     Java 
   
   @Lob   
@Column(name = "doc_txt")   
private String docText;

Let’s use Hibernate to generate a DDL for the table to map the `Document` entity. The SQL will be:

     SQL 
   
   create table document ( 
    id int8 generated by default as identity, 
    date_created timestamp not null, 
    doc_txt oid, 
    primary key (id) 
);

As we can see, the doc_text column datatype is oid. What is it? According to the documentation:

PostgreSQL provides two distinct ways to store binary data. Binary data can be stored in a table using the data type BYTEA or by using the Large Object feature, which stores the binary data in a separate table in a special format and refers to that table by storing a value of type OID in your table.

In our case, the second option is in effect. This separate table’s name is pg_largeobject, and it stores data split into “pages”, usually 2 kB each, as stated in the docs.

So, Hibernate stores large text in a separate table as binary data. Does it mean we should do an additional join when selecting data or an additional insert when saving it? Let’s enable SQL logging, create the Document entity and save it to the database using Spring Data JPA.

     Java 
   
   Document doc = new Document();   
doc.setDateCreated(LocalDateTime.of(2020, 1, 1, 10, 10));   
doc.setDocText("This is the doc text");   
Document saved = documentRepository.save(doc);

Hibernate will show an ordinary SQL insert in the console:

     SQL 
   
   insert  
into 
    document 
    (date_created, doc_txt)  
values 
    (?, ?)

Now we can check if the data was correctly stored by executing the following SQL in a console:

     SQL 
   
   select * from document

The result that we will see should be similar to this:

id	date_created	doc_txt
1	2020-01-01 10:10:00	76388

We don’t see the document text in this table, just a reference to the object in the large object storage. Let’s check the pg_largeobject table:

     SQL 
   
   select * from pg_largeobject where loid=76338

Now we can see the document text.

loid	pageno	data
76388	0	This is the doc text

So, Hibernate saves data into two tables automatically under the hood. Now we can try to fetch the document data using Spring Data JPA:

     Java 
   
   documentRepository.findById(1L).ifPresent(d -> System.out.println(d.getDocText()));

We can see the following SQL in the console:

     SQL 
   
 
 
   select   
   document0_.id as id1_0_0_,   
   document0_.date_created as date_cre2_0_0_,   
   document0_.doc_txt as doc_txt3_0_0_    
from   
   document document0_    
where   
   document0_.id=?   
 
  

And the output should be as expected:

     Plain Text 
   
   This is the doc text

Hibernate selects the data from the pg_largeobject table transparently. Let’s try to use JPQL to execute the same query. To do this, we create an additional Spring Data JPA repository method and invoke it:

     Java 
   
 
 
   //repository   
@Query("select d from Document d where d.id = ?1")   
Optional<Document> findByIdIs(Long id); 
//... 
//invocation 
documentRepository.findByIdIs(1L).ifPresent(d -> System.out.println(d.getDocText())); 
  

The method will fail:

     Plain Text 
   
 
 
   org.springframework.orm.jpa.JpaSystemException: Unable to access lob stream   
…   
Caused by: org.hibernate.HibernateException: Unable to access lob stream   
…   
Caused by: org.postgresql.util.PSQLException: Large Objects may not be used in auto-commit mode.    
  

Hibernate performs additional database reading to fetch LOB data. In auto-commit mode, this reading is executed in a separate transaction. PostgreSQL driver explicitly prohibits it, as shown in the error message above. To fix this, we need to execute such queries in one transaction or disable auto-commit mode.

Spring Data JPA methods from `CrudRepository` like findById() and findAll() are executed in one transaction by default. That is why everything worked fine in the first example. When we use Spring Data JPA query methods or JPQL queries, we must use @Transactional explicitly as in the example below.

     Java 
   
   @Transactional 
@Query("select d from Document d where d.id = ?1")   
Optional<Document> findByIdIs(Long id); 

@Transactional 
List<Document> findByDateCreatedIsBefore(LocalDateTime dateCreated);

If disabling auto-commit mode seems preferable to using the @Transactional annotation, we should look into the documentation for the app libraries. For example, to do it for the default connection pool implementation (HikariCP) in Spring Boot, we need to set the spring.datasource.hikari.auto-commit property to false.

Storing text in a separate table might cause other issues. Let’s add a repository method to select documents using the LIKE clause for the docText field:

     Java 
   
   @Transactional   
List<Document> findByDocTextLike(String text);

This method will generate the following query:

     SQL 
   
 
 
   select   
   document0_.id as id1_0_,   
   document0_.date_created as date_cre2_0_,   
   document0_.doc_txt as doc_txt3_0_    
from   
   document document0_    
where   
   document0_.doc_txt like ? escape ?  
  

...and this query will fail with the following error:

     Plain Text 
   
   org.springframework.dao.InvalidDataAccessResourceUsageException: could not extract ResultSet; SQL [n/a];    
…   
Caused by: org.postgresql.util.PSQLException: ERROR: function pg_catalog.like_escape(bigint, character varying) does not exist   
Hint: No function matches the given name and argument types. You might need to add explicit type casts.

Hibernate cannot generate proper SQL to handle the LIKE clause for LOB text columns. For this case, we can use the native query. In this query, we have to fetch text data from the LOB storage and convert it to string format. After that, we can use it in the LIKE clause (and don’t forget about `@Transactional):

     Java 
   
   @Query(value = "select * from document d " +   
       "where convert_from(lo_get(doc_txt), 'UTF-8') like '%'||?1||'%'", nativeQuery = true)   
@Transactional   
List<Document> findByDocTextLike(String text);

Now everything works fine. Please remember that native queries may not be compatible with other RDBMSes and are not validated in runtime. Use them only when absolutely necessary.

Conclusion: Storing Text as LOB

So, what are the pros and cons of storing large texts as LOB objects in PostgreSQL?
Pro:

PostgreSQL uses optimized storage for LOB objects
We can store up to 4Gb of text there

Con:

Some functions (LIKE, SUBSTRING, etc.) in the WHERE clause do not work in Hibernate for LOB text columns. We need to use native queries for this.
To fetch text with JPQL or Spring Data JPA repository query methods, we must use the @Transactional annotation for repository methods or disable the auto-commit mode

The question here is: why don’t we store text data right in the table? Let’s discuss this option too.

Option 2: Column Re-Definition

PostgreSQL allows us to store long text data in a column of a particular datatype - TEXT. We can specify the column definition right in the annotation.

     Java 
   
   @Column(name = "doc_txt", columnDefinition = "text")   
private String docText;

This allows us to work with long text in a "usual" way. No transactions, native queries, and JPQL works as expected. Compared to LOB type there is a limitation

..., the longest possible character string that can be stored is about 1 GB

It is smaller than 4Gb, allowed by LOB storage, but still long enough for most use cases.

The only problem here is the hardcoded column definition. To overcome it, we can use annotation @Type and converter org.hibernate.type.TextType in Hibernate 5. It has an advantage over the previous column definition: it is not vendor-specific.

     Java 
   
   @Type(type = "org.hibernate.type.TextType")   
@Column(name = "doc_txt")   
private String docText;

In Hibernate 6, the org.hibernate.type.TextType class was removed. To define a column to store a long text, we can define the attribute in the following way:

     Java 
   
   @Column(name = "doc_txt", length = Length.LOB_DEFAULT)   
private String docText;

This will give us the following column definition in the database: doc_txt varchar(1048576). It is not the TEXT datatype, but it can still store about 1Gb of text in the table. It is the largest possible character string in PostgreSQL.

We can generate a column with TEXT datatype in Hibernate 6 by defining the docText attribute like this:

     Java 
   
   @JdbcTypeCode(SqlTypes.LONG32VARCHAR)   
@Column(name = "doc_txt")   
private String docText;

Unfortunately, as of today (June 2022), Hibernate 6 cannot fetch data from the table. It generates the correct table and column definition of the TEXT type though. The data extraction from the doc_txt column into the entity attribute fails. The error text looks like this:

     Plain Text 
   
   Unknown wrap conversion requested: [B to java.lang.String : `org.hibernate.type.descriptor.java.StringJavaType` (java.lang.String)

So, storing long text in the TEXT/VARCHAR column brings fewer issues. No problems with transactions, LIKE conditions, etc. The only downside is storage size (up to 1Gb). Are there any more caveats with it?

There might be a problem if we use the TEXT column type in the database and the @Lob annotation. Let’s see how it works. First, let’s create a table document and insert some data into it:

     SQL 
   
 
 
   create table document ( 
    id int8 generated by default as identity, 
    date_created timestamp not null, 
    doc_txt text, 
    primary key (id) 
);   
  
insert into document (id, date_created, doc_txt) values (1, '2021-10-10', 'This is the document text number 1'); 
  

We will use the document entity definition with the @Lob column:

     Java 
   
 
 
   @Entity   
@Table(name = "document")   
public class Document {   
   @Id   
   @GeneratedValue(strategy = GenerationType.IDENTITY)   
   @Column(name = "id", nullable = false)   
   private Long id;   
  
   @Column(name = "date_created", nullable = false)   
   private LocalDateTime dateCreated;   
  
   @Lob   
   @Column(name = "doc_txt")   
   private String docText;   
  
   //Getters and setters omitted for brevity   
} 
  

The code for document fetching will be the same:

     Java 
   
   documentRepository.findById(1L).ifPresent(d -> System.out.println(d.getDocText()));

If we try to execute the repository method, we will see the following:

     Plain Text 
   
   java.lang.IllegalStateException: Failed to execute Application   
…  
Caused by: org.hibernate.exception.DataException: could not execute query   
…   
Caused by: org.postgresql.util.PSQLException: Bad value for type long: This is the document text number 1   
…

As we can see, Hibernate handles @Lob attribute values as a reference to LOB object data. The TEXT column type in the database table does not affect this behavior.

And what about saving data? Let’s clean the table, try to save the document entity with the @Lob field, and fetch it using Spring Data JPA. Here is the code to do this:

     Java 
   
 
 
   //Saving 
Document doc = new Document();   
doc.setDateCreated(LocalDateTime.now());   
doc.setDocText("This is another text document");   
documentRepository.save(doc);  
... 
//Fetching 
documentRepository.findAll().forEach(d -> System.out.println(d.getDocText));   
... 
//Result 
This is another text document 
 
  

So, it looks as if our entity with the @Lob attribute can work with TEXT columns. In the database tables we will see the familiar picture:

id	date_created	doc_txt
1	2022-06-16 15:28:26.751041	76388

loid	pageno	data
76388	0	This is another text document

If we insert document data into the table using SQL and then select the data, we will get the following:

     SQL 
   
   insert into document (date_created, doc_txt) values ('2021-10-10', 'This is the document text'); 

select * from document;

id	date_created	doc_txt
1	2022-06-16 15:28:26.751041	76388
2	2021-10-10 00:00:00	This is the document text

Now we won’t be able to select data from the database using Spring Data JPA. The application will crash with the type conversion error while selecting the second row.

Let’s add the @Type annotation to the attribute...

     Java 
   
   @Lob   
@Type(type = "org.hibernate.type.TextType")   
@Column(name = "doc_txt")   
private String docText;

...and try to print documents' text data to the app console.

     Java 
   
   documentRepository.findAll().forEach(d -> System.out.println(d.getDocText));

We’ll see the following:

   
   Hibernate: select document0_.id as id1_0_, document0_.date_created as date_cre2_0_, document0_.text as text3_0_ from document document0_   
  
76388   
This is the document text

With the @Type annotation, we can select data, but the OID reference is translated into text, so we “lose” the text stored in the LOB storage.

Conclusion: Storing Long Text in Table

So what are the pros and cons of storing long text as a text column in the database:
Pro:

Queries work as expected; no need for separate transactions or native queries

Con:

Storage size is limited to 1Gb
Mixing the @Lob attribute definition and TEXT column datatype may cause unexpected results.

Final Words: How to Store Long Text in PostgreSQL

For most cases, storing long text data in the same table along with other entity data should work fine. It will allow you to manipulate data using both Hibernate and direct SQL.
- In Hibernate 5, use @Type(type = "org.hibernate.type.TextType") annotation for the JPA entity attribute.
- If you use Hibernate 6, prefer @Column(name = ..., length = Length.LOB_DEFAULT) annotation for the column definition.
- Note that we cannot store more than 1Gb of text when using this approach.
If you plan to store significant amounts of character data (more than 1Gb), use the @Lob annotation for the JPA entity attribute. Hibernate will use PostgreSQL’s dedicated storage optimized for large amounts of data. There are several things that we should consider when we use LOBs.
- We must execute JPQL queries and Spring Data JPA query methods in one transaction or disable auto-commit mode explicitly.
- To use the LOB column in the WHERE condition, we might need to use native queries.
There is great advice in Hibernate documentation: Please don’t (ab)use JPA’s @Lob annotation just because you want a TEXT column. The purpose of the @Lob annotation is not to control DDL generation! Hence do not use the @Lob entity attribute definition with the TEXT column datatype.

Hope those simple rules will help you avoid issues while storing text data in PostgreSQL with Hibernate.

Database Document Relational database Spring Data Data (computing) Hibernate sql Strings PostgreSQL Data Types

Opinions expressed by DZone contributors are their own.

Related

Trending