DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • The Generic Way To Convert Between Java and PostgreSQL Enums
  • JSON-Based Serialized LOB Pattern
  • Architecture and Code Design, Pt. 1: Relational Persistence Insights to Use Today and On the Upcoming Years
  • SQL Commands: A Brief Guide

Trending

  • Useful System Table Queries in Relational Databases
  • Simpler Data Transfer Objects With Java Records
  • Proactive Security in Distributed Systems: A Developer’s Approach
  • Is Big Data Dying?
  1. DZone
  2. Data Engineering
  3. Databases
  4. How to Store Text in PostgreSQL: Tips, Tricks, and Traps

How to Store Text in PostgreSQL: Tips, Tricks, and Traps

In this article, we will review various options to store long text in the PostgreSQL database: @Lob attributes, TEXT, and long VARCHAR table columns. Also, we'll have a look at the difference between Hibernate 5 and 6 in storing long text data.

By 
Andrey Belyaev user avatar
Andrey Belyaev
DZone Core CORE ·
Jul. 07, 22 · Tutorial
Likes (9)
Comment
Save
Tweet
Share
21.3K Views

Join the DZone community and get the full member experience.

Join For Free

DDL generation based on JPA entities definition is a daily task for many developers. In most cases, we use tools like Hibernate's built-in generator or JPA Buddy plugin. They make the job easier, but there are exceptions. When it comes to storing big chunks of data in the database, things get a bit complicated.

Use Case: Storing Documents 

Let’s assume we need to store a document object with its content in the PostgreSQL database. The JPA entity code for this might look like the code below:

Java
 
@Entity  
@Table(name = "document")  
public class Document {  
   @Id  
   @GeneratedValue(strategy = GenerationType.IDENTITY)  
   @Column(name = "id", nullable = false)  
   private Long id;  
 
   @Column(name = "date_created", nullable = false)  
   private LocalDateTime dateCreated;  
 
   @Column(name = "doc_txt")  
   private String docText;  
   
   //Getters and setters omitted for brevity  
}  


The question is: what if we need to store really long document text? In Java, the string datatype can hold about 2Gb of text data, but the table column size will be limited to 255 characters by default for the model above. So, what should we change? 

Option 1: Use LOB Storage

In relational databases, a particular data type exists to store big amounts of data: LOB (Large OBject). Once we need to store large text in the database, we can start with defining a LOB column. All we need to do is mark the docText attribute with the @Lob annotation.

Java
 
@Lob   
@Column(name = "doc_txt")   
private String docText;   


Let’s use Hibernate to generate a DDL for the table to map the `Document` entity. The SQL will be:

SQL
 
create table document ( 
    id int8 generated by default as identity, 
    date_created timestamp not null, 
    doc_txt oid, 
    primary key (id) 
);   


As we can see, the doc_text column datatype is oid. What is it? According to the documentation: 

PostgreSQL provides two distinct ways to store binary data. Binary data can be stored in a table using the data type BYTEA or by using the Large Object feature, which stores the binary data in a separate table in a special format and refers to that table by storing a value of type OID in your table. 

In our case, the second option is in effect. This separate table’s name is pg_largeobject, and it stores data split into “pages”, usually 2 kB each, as stated in the docs.

So, Hibernate stores large text in a separate table as binary data. Does it mean we should do an additional join when selecting data or an additional insert when saving it? Let’s enable SQL logging, create the Document entity and save it to the database using Spring Data JPA.

Java
 
Document doc = new Document();   
doc.setDateCreated(LocalDateTime.of(2020, 1, 1, 10, 10));   
doc.setDocText("This is the doc text");   
Document saved = documentRepository.save(doc); 


Hibernate will show an ordinary SQL insert in the console:

SQL
 
insert  
into 
    document 
    (date_created, doc_txt)  
values 
    (?, ?) 


Now we can check if the data was correctly stored by executing the following SQL in a console:

SQL
 
select * from document


The result that we will see should be similar to this:

id date_created doc_txt
1 2020-01-01 10:10:00 76388


We don’t see the document text in this table, just a reference to the object in the large object storage. Let’s check the pg_largeobject table:

SQL
 
select * from pg_largeobject where loid=76338


Now we can see the document text.

loid pageno data
76388 0 This is the doc text


So, Hibernate saves data into two tables automatically under the hood. Now we can try to fetch the document data using Spring Data JPA:

Java
 
documentRepository.findById(1L).ifPresent(d -> System.out.println(d.getDocText()));   


We can see the following SQL in the console:

SQL
 
select   
   document0_.id as id1_0_0_,   
   document0_.date_created as date_cre2_0_0_,   
   document0_.doc_txt as doc_txt3_0_0_    
from   
   document document0_    
where   
   document0_.id=?   


And the output should be as expected:

Plain Text
 
This is the doc text


Hibernate selects the data from the pg_largeobject table transparently. Let’s try to use JPQL to execute the same query. To do this, we create an additional Spring Data JPA repository method and invoke it:

Java
 
//repository   
@Query("select d from Document d where d.id = ?1")   
Optional<Document> findByIdIs(Long id); 
//... 
//invocation 
documentRepository.findByIdIs(1L).ifPresent(d -> System.out.println(d.getDocText()));


The method will fail:

Plain Text
 
org.springframework.orm.jpa.JpaSystemException: Unable to access lob stream   
…   
Caused by: org.hibernate.HibernateException: Unable to access lob stream   
…   
Caused by: org.postgresql.util.PSQLException: Large Objects may not be used in auto-commit mode.   


Hibernate performs additional database reading to fetch LOB data. In auto-commit mode, this reading is executed in a separate transaction. PostgreSQL driver explicitly prohibits it, as shown in the error message above. To fix this, we need to execute such queries in one transaction or disable auto-commit mode.   

Spring Data JPA methods from `CrudRepository` like findById() and findAll() are executed in one transaction by default. That is why everything worked fine in the first example. When we use Spring Data JPA query methods or JPQL queries, we must use @Transactional explicitly as in the example below.

Java
 
@Transactional 
@Query("select d from Document d where d.id = ?1")   
Optional<Document> findByIdIs(Long id); 

@Transactional 
List<Document> findByDateCreatedIsBefore(LocalDateTime dateCreated);


If disabling auto-commit mode seems preferable to using the @Transactional annotation, we should look into the documentation for the app libraries. For example, to do it for the default connection pool implementation (HikariCP) in Spring Boot, we need to set the spring.datasource.hikari.auto-commit property to false.

Storing text in a separate table might cause other issues. Let’s add a repository method to select documents using the LIKE clause for the docText field:

Java
 
@Transactional   
List<Document> findByDocTextLike(String text);


This method will generate the following query:

SQL
 
select   
   document0_.id as id1_0_,   
   document0_.date_created as date_cre2_0_,   
   document0_.doc_txt as doc_txt3_0_    
from   
   document document0_    
where   
   document0_.doc_txt like ? escape ? 


...and this query will fail with the following error:

Plain Text
 
org.springframework.dao.InvalidDataAccessResourceUsageException: could not extract ResultSet; SQL [n/a];    
…   
Caused by: org.postgresql.util.PSQLException: ERROR: function pg_catalog.like_escape(bigint, character varying) does not exist   
Hint: No function matches the given name and argument types. You might need to add explicit type casts.   


Hibernate cannot generate proper SQL to handle the LIKE clause for LOB text columns. For this case, we can use the native query. In this query, we have to fetch text data from the LOB storage and convert it to string format. After that, we can use it in the LIKE clause (and don’t forget about `@Transactional):

Java
 
@Query(value = "select * from document d " +   
       "where convert_from(lo_get(doc_txt), 'UTF-8') like '%'||?1||'%'", nativeQuery = true)   
@Transactional   
List<Document> findByDocTextLike(String text);


Now everything works fine. Please remember that native queries may not be compatible with other RDBMSes and are not validated in runtime. Use them only when absolutely necessary. 

Conclusion: Storing Text as LOB

So, what are the pros and cons of storing large texts as LOB objects in PostgreSQL?
Pro:

  • PostgreSQL uses optimized storage for LOB objects
  • We can store up to 4Gb of text there

Con:

  • Some functions (LIKE, SUBSTRING, etc.) in the WHERE clause do not work in Hibernate for LOB text columns. We need to use native queries for this.
  • To fetch text with JPQL or Spring Data JPA repository query methods, we must use the @Transactional annotation for repository methods or disable the auto-commit mode

The question here is: why don’t we store text data right in the table? Let’s discuss this option too.

Option 2: Column Re-Definition  

PostgreSQL allows us to store long text data in a column of a particular datatype - TEXT. We can specify the column definition right in the annotation.

Java
 
@Column(name = "doc_txt", columnDefinition = "text")   
private String docText;


This allows us to work with long text in a "usual" way. No transactions, native queries, and JPQL works as expected. Compared to LOB type there is a limitation

..., the longest possible character string that can be stored is about 1 GB 

It is smaller than 4Gb, allowed by LOB storage, but still long enough for most use cases.

The only problem here is the hardcoded column definition. To overcome it, we can use annotation @Type and converter org.hibernate.type.TextType in Hibernate 5. It has an advantage over the previous column definition: it is not vendor-specific.

Java
 
@Type(type = "org.hibernate.type.TextType")   
@Column(name = "doc_txt")   
private String docText;


In Hibernate 6, the org.hibernate.type.TextType class was removed. To define a column to store a long text, we can define the attribute in the following way:

Java
 
@Column(name = "doc_txt", length = Length.LOB_DEFAULT)   
private String docText; 


This will give us the following column definition in the database: doc_txt varchar(1048576). It is not the TEXT datatype, but it can still store about 1Gb of text in the table. It is the largest possible character string in PostgreSQL.

We can generate a column with TEXT datatype in Hibernate 6 by defining the docText attribute like this:

Java
 
@JdbcTypeCode(SqlTypes.LONG32VARCHAR)   
@Column(name = "doc_txt")   
private String docText;


Unfortunately, as of today (June 2022), Hibernate 6 cannot fetch data from the table. It generates the correct table and column definition of the TEXT type though. The data extraction from the doc_txt column into the entity attribute fails. The error text looks like this:

Plain Text
 
Unknown wrap conversion requested: [B to java.lang.String : `org.hibernate.type.descriptor.java.StringJavaType` (java.lang.String) 


So, storing long text in the TEXT/VARCHAR column brings fewer issues. No problems with transactions, LIKE conditions, etc. The only downside is storage size (up to 1Gb). Are there any more caveats with it?

There might be a problem if we use the TEXT column type in the database and the @Lob annotation. Let’s see how it works. First, let’s create a table document and insert some data into it:

SQL
 
create table document ( 
    id int8 generated by default as identity, 
    date_created timestamp not null, 
    doc_txt text, 
    primary key (id) 
);   
  
insert into document (id, date_created, doc_txt) values (1, '2021-10-10', 'This is the document text number 1');


We will use the document entity definition with the @Lob column:

Java
 
@Entity   
@Table(name = "document")   
public class Document {   
   @Id   
   @GeneratedValue(strategy = GenerationType.IDENTITY)   
   @Column(name = "id", nullable = false)   
   private Long id;   
  
   @Column(name = "date_created", nullable = false)   
   private LocalDateTime dateCreated;   
  
   @Lob   
   @Column(name = "doc_txt")   
   private String docText;   
  
   //Getters and setters omitted for brevity   
}


The code for document fetching will be the same:

Java
 
documentRepository.findById(1L).ifPresent(d -> System.out.println(d.getDocText()));


If we try to execute the repository method, we will see the following:

Plain Text
 
java.lang.IllegalStateException: Failed to execute Application   
…  
Caused by: org.hibernate.exception.DataException: could not execute query   
…   
Caused by: org.postgresql.util.PSQLException: Bad value for type long: This is the document text number 1   
…   


As we can see, Hibernate handles @Lob attribute values as a reference to LOB object data. The TEXT column type in the database table does not affect this behavior.

And what about saving data? Let’s clean the table, try to save the document entity with the @Lob field, and fetch it using Spring Data JPA. Here is the code to do this:

Java
 
//Saving 
Document doc = new Document();   
doc.setDateCreated(LocalDateTime.now());   
doc.setDocText("This is another text document");   
documentRepository.save(doc);  
... 
//Fetching 
documentRepository.findAll().forEach(d -> System.out.println(d.getDocText));   
... 
//Result 
This is another text document 


So, it looks as if our entity with the @Lob attribute can work with TEXT columns. In the database tables we will see the familiar picture:

id date_created doc_txt
1 2022-06-16 15:28:26.751041 76388
loid pageno data
76388 0 This is another text document


If we insert document data into the table using SQL and then select the data, we will get the following:

SQL
 
insert into document (date_created, doc_txt) values ('2021-10-10', 'This is the document text'); 

select * from document; 
id date_created doc_txt
1 2022-06-16 15:28:26.751041 76388
2 2021-10-10 00:00:00 This is the document text


Now we won’t be able to select data from the database using Spring Data JPA. The application will crash with the type conversion error while selecting the second row.

Let’s add the @Type annotation to the attribute...

Java
 
@Lob   
@Type(type = "org.hibernate.type.TextType")   
@Column(name = "doc_txt")   
private String docText;


...and try to print documents' text data to the app console.

Java
 
documentRepository.findAll().forEach(d -> System.out.println(d.getDocText)); 


We’ll see the following:

 
Hibernate: select document0_.id as id1_0_, document0_.date_created as date_cre2_0_, document0_.text as text3_0_ from document document0_   
  
76388   
This is the document text 


With the @Type annotation, we can select data, but the OID reference is translated into text, so we “lose” the text stored in the LOB storage.

Conclusion: Storing Long Text in Table  

So what are the pros and cons of storing long text as a text column in the database:
Pro:

  • Queries work as expected; no need for separate transactions or native queries

Con:

  • Storage size is limited to 1Gb
  • Mixing the @Lob attribute definition and TEXT column datatype may cause unexpected results.

Final Words: How to Store Long Text in PostgreSQL

  1. For most cases, storing long text data in the same table along with other entity data should work fine. It will allow you to manipulate data using both Hibernate and direct SQL.
    • In Hibernate 5, use @Type(type = "org.hibernate.type.TextType") annotation for the JPA entity attribute.
    • If you use Hibernate 6, prefer @Column(name = ..., length = Length.LOB_DEFAULT) annotation for the column definition.
    • Note that we cannot store more than 1Gb of text when using this approach.
  2. If you plan to store significant amounts of character data (more than 1Gb), use the @Lob annotation for the JPA entity attribute. Hibernate will use PostgreSQL’s dedicated storage optimized for large amounts of data. There are several things that we should consider when we use LOBs.
    • We must execute JPQL queries and Spring Data JPA query methods in one transaction or disable auto-commit mode explicitly.
    • To use the LOB column in the WHERE condition, we might need to use native queries.
  3. There is great advice in Hibernate documentation: Please don’t (ab)use JPA’s @Lob annotation just because you want a TEXT column. The purpose of the @Lob annotation is not to control DDL generation! Hence do not use the @Lob entity attribute definition with the TEXT column datatype.

Hope those simple rules will help you avoid issues while storing text data in PostgreSQL with Hibernate.  

Database Document Relational database Spring Data Data (computing) Hibernate sql Strings PostgreSQL Data Types

Opinions expressed by DZone contributors are their own.

Related

  • The Generic Way To Convert Between Java and PostgreSQL Enums
  • JSON-Based Serialized LOB Pattern
  • Architecture and Code Design, Pt. 1: Relational Persistence Insights to Use Today and On the Upcoming Years
  • SQL Commands: A Brief Guide

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!