Performance of ULID and UUID in Postgres Database
Don't know what data type to use for the primary key in Postgres? Let's take measurements for UUID and ULID and decide which is better.
Join the DZone community and get the full member experience.Join For Free
Hello everyone! In this article, I want to share my knowledge and opinion about the data types that are often used as an identifier. Today we will touch on two topics at once. These are measurements of search speed by key and data types for the key on the database side.
UUID and ULID
Why do we need some kind of incomprehensible types for IDs? I won’t talk about distributed systems, connectivity of services, sensitive data, and the like. If someone is interested in this, they can Google it - at the moment we are interested in performance. As the name suggests, we will talk about two types of keys: UUID and ULID.
UUID has long been known to everyone, but ULID may be unfamiliar to some. The main advantage of ULID is that it is monotonically increasing and is a sortable type. Naturally, these are not all the differences. Personally, I also like the fact that there are no special characters in it.
A small digression, I noticed a long time ago that many teams use the
varchar(36) data type to store UUID in the PostgreSQL database and I don’t like this, since this database has a corresponding data type for UUID. A little later, we will see which type is preferable on the database side. Therefore, we will look not only at a comparison of the two data types on the backend side but also at the difference when storing UUID in different formats on the database side.
So let's start comparing things.
- The UUID is 36 characters long and takes up 128 bits of memory.
- The ULID is 26 characters long and also takes up 128 bits of memory.
For my examples, I created two tables in the database with three fields:
CREATE TABLE test.speed_ulid ( id varchar(26) PRIMARY KEY, name varchar(50), created timestamp ); CREATE TABLE test.speed_uuid ( id varchar(36) PRIMARY KEY, name varchar(50), created timestamp );
For the first comparison, I stored the UUID in
varchar(36) format, as is often done. In the database, I recorded 1,000,000 in each of the tables.
The test case will consist of 100 requests using identifiers previously pulled from the database; that is, when calling the test method, we will access the database 100 times and retrieve the entity by key. The connection will be created and warmed up before measurement. We will conduct two test runs and then 10 effective iterations. For your convenience, I will provide a link to the Java code at the end of the article.
Sorry, but the measurements were taken on a standard MacBook Pro laptop and not on a dedicated server, but I don't believe there will be a significant difference in the results other than increased time spent on network traffic between the database and the backend.
Here is some background information:
- # CPU I9-9980HK
- # CPU count: 16
- # RAM: 32GB
- # JMH version: 1.37
- # VM version: JDK 11.0.12, Java HotSpot(TM) 64-Bit Server VM, 11.0.12+8-LTS-237
- # DB: PostgreSQL 13.4, build 1914, 64-bit
Queries that will be used to obtain an entity by key:
SELECT * FROM test.speed_ulid where id = ? SELECT * FROM test.speed_uuid where id = ?
Let's look at the measurement results. Let me remind you that each table has 1,000,000 rows.
Both Types of Identifiers Are Stored in the Database as varchar
I ran this test several times, and the result was about the same: either the ULID was a little faster, or the UUID. In percentage terms, the difference is practically zero.
Well, you can disagree that there is no difference between these types. I would say that it is not possible to use other data types on the database side.
UUID as uuid, ULID as varchar in DB
For the next test, I changed the data type from
uuid in the test.speed_uuid table.
In this case, the difference is obvious: 4.5% in favor of UUID.
As you can see, it makes sense to use the
uuid data type on the database side in the case of a type of the same name on the service side. The index for this format is very well optimized in PostgreSQL and shows good results.
Well, now we can definitely part ways. Or not?
If you look at the index search query plan, you can see the following
((id)::text = '01HEE5PD6HPWMBNF7ZZRF8CD9R'::text) in the case when we use varchar.
In general, comparing two text variables is a rather slow operation, so maybe there is no need to store the ID in this format. Or are there other ways to speed up key comparison? First, let's create another index of the kind “
hash” for the table with ULID.
create index speed_ulid_id_index on test.speed_ulid using hash (id);
Let's look at the execution plan for our query:
We will see that the database uses a hash index, and not a btree in this case. Let's run our test and see what happens.
varchar + index(hash) for ULID, uuid for UUID
This combination gave an increase of 2.3% relative to
uuid and its cheating index.
I'm not sure that keeping two indexes on one field can somehow be justified. So it's worth considering whether there's more you can do. And here it’s worth looking into the past and remembering how
uuid or some other string identifiers used to be stored. That's right: either text or a byte array.
So let's try this option: I removed all the indexes for the ULID, cast it to
bytea , and recreated the primary key.
bytea for ULID, uuid for UUID
As a result, we got approximately the same result as in the previous run with an additional index, but I personally like this option better.
Measurement result with 2,000,000 rows in the database:
Measurement result with 3,000,000 rows in the database:
I think there is no point in continuing measurements further. The pattern remains: ULID saved as
bytea slightly outperforms UUID saved as
uuid in DB.
If we take the data from the first measurements, it is obvious that with the help of small manipulations, you can increase performance by about 9% if you use
So, if you have read this far, I assume the article was interesting to you and you have already drawn some conclusions for yourself.
It is worth noting that the measurements were made under ideal conditions for both the backend part and the database. We did not have any parallel processes running that write something to the database, change records, or perform complex calculations on the back-end side.
Let's go over the material. What did you learn that was useful?
- Do not neglect the
uuiddata type on the PostgreSQL side. Perhaps someday extensions for ULID will appear in this database, but for now, we have what we have.
- Sometimes it is worth creating an additional index of the desired type manually, but there is an overhead to consider.
- If you are not afraid of unnecessary work - namely, writing your own converters for types - then you should try
byteaif there is no corresponding type for your identifier on the database side.
What type of data should be used for the primary key and in what format should it be stored? I don’t have a definite answer to these questions: it all depends on many factors. It is also worth noting that a competent choice of data type for ID, and not only for it, can at some point play an important role in your project.
I hope this article was useful to you. Good luck!
Opinions expressed by DZone contributors are their own.