DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • The Generic Way To Convert Between Java and PostgreSQL Enums
  • Charsets and Unicode Identifiers in Java
  • Recurrent Workflows With Cloud Native Dapr Jobs
  • Hybrid Search Using Postgres DB

Trending

  • Unlocking Data with Language: Real-World Applications of Text-to-SQL Interfaces
  • Driving DevOps With Smart, Scalable Testing
  • Building an AI/ML Data Lake With Apache Iceberg
  • Dropwizard vs. Micronaut: Unpacking the Best Framework for Microservices
  1. DZone
  2. Software Design and Architecture
  3. Performance
  4. Performance of ULID and UUID in Postgres Database

Performance of ULID and UUID in Postgres Database

Don't know what data type to use for the primary key in Postgres? Let's take measurements for UUID and ULID and decide which is better.

By 
Artem Artemev user avatar
Artem Artemev
·
Nov. 15, 23 · Analysis
Likes (10)
Comment
Save
Tweet
Share
20.6K Views

Join the DZone community and get the full member experience.

Join For Free

Hello everyone! In this article, I want to share my knowledge and opinion about the data types that are often used as an identifier. Today we will touch on two topics at once. These are measurements of search speed by key and data types for the key on the database side.

I will use a PostgreSQL database and a demo Java service to compare query speeds. 

UUID and ULID

Why do we need some kind of incomprehensible types for IDs? I won’t talk about distributed systems, connectivity of services, sensitive data, and the like. If someone is interested in this, they can Google it - at the moment we are interested in performance. As the name suggests, we will talk about two types of keys: UUID and ULID.

UUID has long been known to everyone, but ULID may be unfamiliar to some. The main advantage of ULID is that it is monotonically increasing and is a sortable type. Naturally, these are not all the differences. Personally, I also like the fact that there are no special characters in it.

A small digression, I noticed a long time ago that many teams use the varchar(36) data type to store UUID in the PostgreSQL database and I don’t like this, since this database has a corresponding data type for UUID. A little later, we will see which type is preferable on the database side. Therefore, we will look not only at a comparison of the two data types on the backend side but also at the difference when storing UUID in different formats on the database side.

Comparison

So let's start comparing things.

  • The UUID is 36 characters long and takes up 128 bits of memory.
  • The ULID is 26 characters long and also takes up 128 bits of memory.

For my examples, I created two tables in the database with three fields:

SQL
 
CREATE TABLE test.speed_ulid
(
    id      varchar(26) PRIMARY KEY,
    name    varchar(50),
    created timestamp
);
CREATE TABLE test.speed_uuid
(
    id       varchar(36) PRIMARY KEY,
    name    varchar(50),
    created timestamp
);


For the first comparison, I stored the UUID in varchar(36) format, as is often done. In the database, I recorded 1,000,000 in each of the tables.

The test case will consist of 100 requests using identifiers previously pulled from the database; that is, when calling the test method, we will access the database 100 times and retrieve the entity by key. The connection will be created and warmed up before measurement. We will conduct two test runs and then 10 effective iterations. For your convenience, I will provide a link to the Java code at the end of the article.

Sorry, but the measurements were taken on a standard MacBook Pro laptop and not on a dedicated server, but I don't believe there will be a significant difference in the results other than increased time spent on network traffic between the database and the backend.

Here is some background information:

  • # CPU I9-9980HK
  • # CPU count: 16
  • # RAM: 32GB
  • # JMH version: 1.37
  • # VM version: JDK 11.0.12, Java HotSpot(TM) 64-Bit Server VM, 11.0.12+8-LTS-237
  • # DB: PostgreSQL 13.4, build 1914, 64-bit

Queries that will be used to obtain an entity by key:

SQL
 
SELECT * FROM test.speed_ulid where id = ?
SELECT * FROM test.speed_uuid where id = ?


Measurement Results

Let's look at the measurement results. Let me remind you that each table has 1,000,000 rows.

Both Types of Identifiers Are Stored in the Database as varchar

Both Types of Identifiers Are Stored in the Database as varchar

I ran this test several times, and the result was about the same: either the ULID was a little faster, or the UUID. In percentage terms, the difference is practically zero.

Well, you can disagree that there is no difference between these types. I would say that it is not possible to use other data types on the database side.

UUID as uuid, ULID as varchar in DB

For the next test, I changed the data type from varchar(36) to uuid in the test.speed_uuid table.

UUID as uuid, ULID as varchar in DB

In this case, the difference is obvious: 4.5% in favor of UUID.

As you can see, it makes sense to use the uuid data type on the database side in the case of a type of the same name on the service side. The index for this format is very well optimized in PostgreSQL and shows good results.

Well, now we can definitely part ways. Or not?

If you look at the index search query plan, you can see the following ((id)::text = '01HEE5PD6HPWMBNF7ZZRF8CD9R'::text) in the case when we use varchar.

In general, comparing two text variables is a rather slow operation, so maybe there is no need to store the ID in this format. Or are there other ways to speed up key comparison? First, let's create another index of the kind “hash” for the table with ULID.

SQL
 
create index speed_ulid_id_index
    on test.speed_ulid using hash (id);


Let's look at the execution plan for our query:

Execution plan for our query

We will see that the database uses a hash index, and not a btree in this case. Let's run our test and see what happens.

varchar + index(hash) for ULID, uuid for UUID

varchar + index(hash) for ULID, uuid for UUID

This combination gave an increase of 2.3% relative to uuid and its cheating index.

I'm not sure that keeping two indexes on one field can somehow be justified. So it's worth considering whether there's more you can do. And here it’s worth looking into the past and remembering how uuid or some other string identifiers used to be stored. That's right: either text or a byte array.

So let's try this option: I removed all the indexes for the ULID, cast it to bytea , and recreated the primary key.

bytea for ULID, uuid for UUID

bytea for ULID, uuid for UUIDAs a result, we got approximately the same result as in the previous run with an additional index, but I personally like this option better.

Measurement result with 2,000,000 rows in the database:

Measurement result with 2,000,000 rows in the database

Measurement result with 3,000,000 rows in the database:

Measurement result with 3,000,000 rows in the database

I think there is no point in continuing measurements further. The pattern remains: ULID saved as bytea slightly outperforms UUID saved as uuid in DB.

If we take the data from the first measurements, it is obvious that with the help of small manipulations, you can increase performance by about 9% if you use varchar.

So, if you have read this far, I assume the article was interesting to you and you have already drawn some conclusions for yourself.

It is worth noting that the measurements were made under ideal conditions for both the backend part and the database. We did not have any parallel processes running that write something to the database, change records, or perform complex calculations on the back-end side.

Сonclusions

Let's go over the material. What did you learn that was useful?

  1. Do not neglect the uuid data type on the PostgreSQL side. Perhaps someday extensions for ULID will appear in this database, but for now, we have what we have.
  2. Sometimes it is worth creating an additional index of the desired type manually, but there is an overhead to consider.
  3. If you are not afraid of unnecessary work - namely, writing your own converters for types - then you should try bytea if there is no corresponding type for your identifier on the database side.

What type of data should be used for the primary key and in what format should it be stored? I don’t have a definite answer to these questions: it all depends on many factors. It is also worth noting that a competent choice of data type for ID, and not only for it, can at some point play an important role in your project.

I hope this article was useful to you. Good luck!

  • Project on GitHub
Identifier PostgreSQL Data Types Java (programming language) Performance improvement

Opinions expressed by DZone contributors are their own.

Related

  • The Generic Way To Convert Between Java and PostgreSQL Enums
  • Charsets and Unicode Identifiers in Java
  • Recurrent Workflows With Cloud Native Dapr Jobs
  • Hybrid Search Using Postgres DB

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!