The UUID Discussion

DZone 's Guide to

The UUID Discussion

· Java Zone ·
Free Resource

Those who are familiar with databases and persistence have probably had the same discussion like I have had a couple of times over the last couple of years:

Do you use numeric or String values as an UUID?

Having done a lot of distributed, multi-tenant development, my stance on this is very clear: I prefer String. Or better said, I prefer UUIDs.

Granted, it’s easier to remember user with id 345 than user with id 03a3a8b1-002c-4f7b-aefb-1691cc274718, but that argument goes out the window when your ids start going in over 10000, in which case you’ll copy paste them and end up with the same amount of effort you would have to do with a UUID.

Where UUID really start coming in handy is when you start synchronizing data across servers. See, most systems that rely on numeric values for an id do so with an autogenerated value. Which means you have no control over which id will be given to a certain entity in your system and therefor makes it a complete mess to transfer data across servers.

For example, say you have an entity A with id 3 and an another entity B that has a foreign key to A. You want to move A and B to another server. Now guess what happens: A gets an new id and you need to manually translate the link from B to A in order to get the correct link. If you forget to do this, you’ll either have a missing foreign key or worse, B may suddenly have a link to an entirely different entity… With UUIDs, you can just transfer the data as is without having to worry about remapping foreign keys.

However, every time I brought this up in the past, the same comment was given: “But you can have duplicates!”. Well, for those who like math: if you generate a billion random UUIDs every second for the next 100 years, you have a 50% chance of having a single collision (a single one!). To put this even more in perspective, if you would limit yourself to the range of a 32-bit integer, the chance to have a collision is 1.6 * 10-27. If you don’t like those odds, I suggest you don’t leave the house (meteorite strike), get in your car (random car accident) or eat something (food poisoning).

So stop using numeric ids and start using UUIDs.

A small addition

You have different versions of UUIDs. Most of the time either version 1 or 4 is used.

Version 1 is timebased and is unique for a same MAC address and within a 100 nanosecond interval. So safe to use if you’re generating less than 10 million UUIDs per second (if you do, you have other issues ;)). Its internal timer wraps around in about 1000 years, so I’m fairly certain you’re safe there and you don’t have to worry about odds. Version 4 is a random-based UUID and has the duplicate odds I mentioned earlier on. An important difference between the two is that Version 1 UUIDs can be sorted chronologically, whereas Version 4 cannot.

And for those who are stressing on the increased storage that using UUIDs causes, have a look at this.

java ,uuid ,database ,php ,ruby ,python ,javascript ,c#

Published at DZone with permission of Lieven Doclo , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}