7 Strategies for Assigning Ids to Microservices
In this article, we'll discuss seven strategies for assigning Ids and their trade-offs.
Join the DZone community and get the full member experience.
Join For FreeIdentities are the defining characteristic of an entity in Domain-Driven Design. And as soon as the Id is public and leaves its immediate context, other components might use it. For example, if service A references by Id an entity from service B, changing the Id of the entity will have a knock-on effect on service A. This is why its important to have several tools in your toolbox. In this blog post, we'll discuss 7 strategies for assigning Ids and their trade-offs.
GUIDs (UUIDs)
This is the simplest and most straightforward option: you just generate a GUID and use it as an identifier.
Pros
- Easy and fast to generate.
- Guaranteed to be unique (okay, not guaranteed, but the probability of generating duplicate GUIDs is so low it doesn't make sense to debate it). This also makes it easier if you want to merge data from two different sources.
- There is no central authority needed to generate a GUID. This means that GUIDs can be generated on two different machines, which makes it easier to use in distributed systems. You might even use client-side generated Ids.
- They don't leak any business intelligence information (more on this below).
Cons
- They are not human readable. GUIDs are 36 characters long with the hyphens, 32 without, and 22 if you encode it.
- They are big (16 bytes).
- GUIDs might hurt database performance (although opinions vary on how much and if it matters). This was not a problem in the systems that I have worked on. Also, there is the option of generating sequential GUIDs.
Sequential Integers
This option usually implies relying on a database server to generate the Id for you.
Pros
- Easy to generate as all RDBMs have this option.
- Human readable and easy to remember.
- Small (4 bytes).
Cons
- Since you need a central authority to generate the Ids (usually the database server), you have a single point of failure and a potential bottleneck. This might hurt salability after a certain point.
- If it's exposed publicly (displayed on a page or part of an URL), it might leak business intelligence data. For example, if I order something now and I get an order with Id 345 and I order something after one month and I get an order with Id 445, then I can infer that the shop is getting about 100 orders per month.
- You need a round trip to the database to get the Id.
Randomized Integers
This approach is based on the one above. You generate a sequential Id, but keep it for internal use only. For external use, you symmetrically encrypt it using Skip32. This will generate an integer that will seem random.
Pros
- Human readable and easy to remember.
- Small (4 bytes)
- Doesn't leak any business intelligence information.
Cons
- Since you need a central authority to generate the Ids (usually the database server), you have a single point of failure and a potential bottleneck. This might hurt salability after a certain point.
- You need two round trips to the database: one to generate the sequential Id and another one to save the encrypted Id.
Short Random Identifiers
In this approach you generate a short but random identifier and then check that it's unique. This is the approach used by URL shorteners like bitly. You can generate it in many ways, like using random over base 62 characters or hashing, base 62, and substring.
Pros
- Short (5 characters will give you approximately 1 billion unique entries).
- Human readable.
- Does not leak any business intelligence information.
Cons
Since the Id is not guaranteed to be unique, you much check for collisions in the database and retry in case the Id is already there. This could be implemented easily using a unique constraint on the Id column in the database and a retry. This approach could break if your data is split in more tables (for example, if you archive old entries ina different table).
Natural Keys
You might work with an entity that already has a unique identity. For example, all books should be uniquely identified by an ISBN. These types of keys are also known as natural keys.
Pros
- The identity is well known in the problem domain.
Cons
You must double check that it is actually a natural key and it is unique. There are cases of two different persons having the same Social Security Number. Changing the Id can be quite painful.
User Input
In this strategy, you are relying on the user to provide the Identity. The most common examples are blog posts. On my blog, for example, the URL is derived from the title and could be used as the unique Id of this blog post. If there are lots of blog posts created and the chance of collision is high, you could append a hash to make it unique (example: https://tomharrisonjr.com/uuid-or-guid-as-primary-keys-be-careful-7b2aa3dcb439).
Pros
- The identity assignment part is simpler: you just use the value from a user input as your identity.
- The identity can provides hints about the content of the entity.
Cons
- Identities should be stable, but what if the users want to change it? What's the cost of change? For example, if the user wants to rename the blog post? Vaughn Vernon, in his Implementing Domain-Driven Design book suggests workflow-based identity approval processes for low-throughput domain. This way you could minimize the chance of misspelling an Id.
- You need to check the uniqueness of the Identity in the database.
Externally Owned Identity
If you're integrating with a third-party you could choose to reuse the Identity that they assign. This doesn't necessarily need to be external to your company, but external to the service.
Pros
Easy to do as it requires only an assignment.
Cons
You need to enusre that the External Identity is stable. If it's not, then when it changes it will impact your system too. For example, there are systems that regenerate their Ids during disaster recovery. This means that if you restore the external system and your system, they will be out of sync.
Conclusion
So, which one should you use? It depends on the context, of course. I find myself using GUIDs most of the time. If it needs to be human readable, then I use short random strings.
What approach do you use most often and why? Have you used other Identity generation strategies that are not on this list?
Published at DZone with permission of Victor Chircu, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments