DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workkloads.

Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • JSON-Based Serialized LOB Pattern
  • How to Store Text in PostgreSQL: Tips, Tricks, and Traps
  • Architecture and Code Design, Pt. 1: Relational Persistence Insights to Use Today and On the Upcoming Years
  • SQL Commands: A Brief Guide

Trending

  • My LLM Journey as a Software Engineer Exploring a New Domain
  • The 4 R’s of Pipeline Reliability: Designing Data Systems That Last
  • The Modern Data Stack Is Overrated — Here’s What Works
  • Building Scalable and Resilient Data Pipelines With Apache Airflow
  1. DZone
  2. Data Engineering
  3. Data
  4. What Is a UUID and Why Should You Care?

What Is a UUID and Why Should You Care?

Learn more about UUIDs and see why they exist, the disadvantages, the different types, and more!

By 
Charlie Custer user avatar
Charlie Custer
·
Apr. 26, 22 · Opinion
Likes (4)
Comment
Save
Tweet
Share
4.3K Views

Join the DZone community and get the full member experience.

Join For Free

When working with a database, it's common practice to use some kind of id field to provide a unique identifier for each row in a table.

Imagine, for example, a customers table. We wouldn't want to use fields such as name or address as unique identifiers, for example, because it's possible more than one customer could have the same name, or share the same address.

Instead, it's a good idea to assign each row some kind of unique identifier. One option we have is to use a UUID.

What Is a UUID?

A UUID — that's short for Universally Unique IDentifier, by the way — is a 36-character alphanumeric string that can be used to identify information (such as a table row).

Here is one example of a UUID: acde070d-8c4c-4f0d-9d8a-162843c10333

UUIDs are widely used in part because they are highly likely to be unique globally, meaning that not only is our row's UUID unique in our database table, it's probably the only row with that UUID in any system anywhere.

(Technically, it's not impossible that the same UUID we generate could be used somewhere else, but with 340,282,366,920,938,463,463,374,607,431,768,211,456 different possible UUIDs out there, the chances are very slim).

Why Do UUIDs Exist?

To answer this question, let's imagine we're operating an ecommerce bookshop. As orders come in, we want to assign them an id number and store them in our orders table using that number.

We could set up sequential IDs such that the first order to come in is 1, the second is 2, and so on, like so:

And this approach might work well, at least for a while, if our scale is small. However, it has some major downsides:

First, it can easily create confusion when we're doing things like joining tables or importing new data, because the id values above aren't unique. This can create problems even internally if we use the same ID system for multiple tables, but it really gets messy when we start working with any kind of outside data.

Imagine, for example, that our little bookshop grows, and we acquire another online bookshop. When we go to integrate our order tables, we find that they've used the same system. Now we've got two order 1s, two order 2s, etc., and to resolve the issue, we'll have to update every single ID in at least one of the two databases we're integrating. Even in a best case scenario, that's going to be a tremendous hassle.

Second, the sequential approach doesn't work well in any kind of distributed system, because it means that INSERT commands must be executed one by one. This restriction can cause major performance issues at scale. Even if your application requires strict ID ordering, using a feature may allow you to meet those requirements while still using UUIDs and not taking the performance hit that comes with sequentially-ordered IDs.

Other traditional approaches to unique IDs, such as generating random IDs with SERIAL, can also lead to hotspots in distributed systems because values generated around the same time have similar values and are thus located close to each other in the table's storage.

UUIDs solve all of these problems because:

They're globally unique, so the chances of encountering a duplicate ID even in external data are very, very small. They can be generated without the need to check against a central node, so in a distributed system, each node can generate UUIDs autonomously without fear of duplication or consistency issues.

Reason #1 alone is a good argument for using UUIDs in almost any database system. As a business that aspires to operate at scale, reason #2 is also very relevant to our bookshop, because distributed databases offer the best scalability and resilience.

Disadvantages of UUIDs

The only significant disadvantage of UUIDs is that they take up 128 bits in memory (and often a bit more when we include metadata). If minimizing storage space is absolutely mission-critical, clearly storing a sequential ID (which will probably range somewhere between 1-10 numeric characters) is going to be more efficient than storing a 36-character alphanumeric.

However, in most cases the disadvantages of using something like a sequential identifier significantly outweigh the minimal increase in storage costs that comes from using UUIDs. UUIDs are extremely popular and widely used for a variety of different identification purposes. We've focused on database examples in this article because UUIDs are also used in analytics systems, web and mobile applications, etc.

Types of UUIDs

There are several different types of UUIDs:

Version 1 and version 2. Sometimes called time-based UUIDs, these IDs are generated using a combination of datetime values (reflecting the time the UUID is being generated), a random value, and a part of the MAC address of the device generating the UUID.

Here's how it breaks down visually:

Generating UUIDs in this way makes having identical UUIDs almost impossible - they would have to be generated by the same device at the exact same time and have generated the exact same random 16-bit sequence.

Because they contain a part of the generating device's MAC address, UUID v1 and UUID v2 IDs can be used to identify (for example) which database node generated the ID. This is generally not a problem, and in distributed systems, it can be an advantage.

(The difference between v1 and v2 UUIDs is that UUID v2s also contain a segment of a local domain number. For a number of reasons, this makes them less optimal for most applications, so UUID v2s are not widely used.)

Version 3 and version 5. These two versions of UUIDs are generated by hashing a namespace identifier and name. They're similar to time-based UUIDs in that they are generated using existing data rather than being entirely random, but rather than using datetime data and the device MAC address, they use namespace data and name data.

The namespace data is itself a UUID, and the name data could really be any arbitrary string, although in practice it typically relates to how the UUID will be used - it might be an account name, for example, or a product ID. But whatever the two values used are, they're hashed to generate a 36-character alphanumeric string that is the final UUID.

UUIDs versions 3 and 5 differ primarily in that they use different hashing algorithms. UUID v3 uses MD5, and UUID v5 uses SHA-1.

Version 4. These UUIDs are simply randomly-generated 36-character strings. Aside from a single 4 to denote that they are v4 UUIDs, each character is simply a random character from a-z or integer from 0-9.

Because the generation is entirely random, they are extremely likely to be unique. They also contain no identifying information like datetime, MAC address, or name data (which may be an advantage or a disadvantage depending on the specific use case).

Versions 6, 7 and 8. As of this writing, none of these versions exist, but they have been proposed and may be added to UUID standards in the coming years. You can read more about these proposed new UUIDs here.

How to Generate a UUID

UUIDs may look complicated, but in the context of modern application development, generating them is actually pretty straightforward. Most popular programming languages will have libraries that make generating UUIDs as simple as calling a function.

For example, in JavaScript, you would simply import the uuid package and then call uuid.v1(), for example, to generate a UUID v1. Python is very similar: you import and call uuid.uuid1() to generate a UUID v1.

Databases, and particularly distributed databases, may also have built-in UUID generation. In CockroachDB, for example, we recommend using UUIDs as row identifiers, and doing that is as easy as using the gen_random_uuid() function.

So, for example, when we create a table using SQL we can ensure it auto-generates a UUID v4 for each row like so:

 
CREATE TABLE users (
        id UUID NOT NULL DEFAULT gen_random_uuid(),
        city STRING NOT NULL,
        name STRING NULL,
        address STRING NULL,
        credit_card STRING NULL,
        CONSTRAINT "primary" PRIMARY KEY (city ASC, id ASC),
        FAMILY "primary" (id, city, name, address, credit_card)
);


Any time a row is inserted into this table, the id value will be an auto-generated UUID.

Long-story short: generating and using UUIDs is generally quite simple. The specific details of implementation may vary slightly based on the specific database technology and/or programming language we're working with, but in most cases, it boils down to little more than calling some kind of 'generate_uuid()` function.

UUIDs in a Distributed Database

Distributed SQL databases offer a powerful combination: the elastic scale and resilience of NoSQL databases combined with the transactional consistency and familiarity of SQL databases.

However, working with distributed systems does mean some things need to be approached differently. As we mentioned earlier, traditional approaches to generating row ids, such as generating sequential IDs by adding 1 to an integer so that rows are identified as 2, 3, 4, etc., do not work well in distributed systems. These approaches lead to "hotspot" nodes and create performance bottlenecks.

UUIDs offer a superior alternative for many distributed workloads, because each node in the database can generate completely unique UUIDs autonomously, without having to check against the other nodes.

This doesn't mean that UUIDs are always the best choice, though. In CockroachDB, using UUIDs often works well, but using a multi-column primary key can offer superior performance in some circumstances (although it's also more complex to set up and test).

Database Data (computing) sql Strings systems Data Types

Published at DZone with permission of Charlie Custer. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • JSON-Based Serialized LOB Pattern
  • How to Store Text in PostgreSQL: Tips, Tricks, and Traps
  • Architecture and Code Design, Pt. 1: Relational Persistence Insights to Use Today and On the Upcoming Years
  • SQL Commands: A Brief Guide

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!