Designing Human-Targeted Random IDs
Very usable human-targeted random IDs are short, only contain digits and ASCII letters, and are designed to prevent and detect typos.
Join the DZone community and get the full member experience.
Join For FreeDesigning Human-Targeted Random IDs
NOTE: We don't deal here with technical IDs used as primary keys in relational databases. See my previous article here if you seek a great way to generate them.
Context
During one of my recent projects, I have been asked to design a scheme of IDs highly usable by humans. The business requirement was mainly to create pseudo-random values that can't be inferred or guessed in order to be used as a secret token printed on some official documents for future controls.
Later on, we had a similar requirement with lower security concerns: generating human-readable file numbers that can be printed on associated documents, verbalized on phone, or typed when doing searches.
Another well-known example (in France at least) is the ID (aka "SNCF number") attached by the French railway company with each train travel so one can open easily any travel details from your smartphone without being fully authenticated.
Main Criteria
After having compared existing solutions and analyzed the business stakeholder's requirements, these criteria emerged:
- These IDs have to be short to be easily typed, read, or verbalized on phone by a human (no more than six to ten characters).
- They have to integrate systems that prevent and detect typos.
- They don't have to be unique (and can't because of their small size and thus variability). However, the system has to prevent collisions either by coupling these IDs with some other values (like a person's last name) or by retrying another attempt when a shuffle value already exists (the solution we use). You’ll have to remind that closed items may own the same ID (when doing a search by ID, for instance, make sure to make status into account).
How To Make These Values Truly Usable?
- Limit the number of possible characters by using more than base-10 (decimal) numbers but add lowercase and uppercase letters. Avoid using others characters (punctuation marks, diacritics,...) that are more difficult to read. Hence, in theory, we can generate numbers made of up to 10 digits + 26 lowercase ASCII letters + 26 uppercase ASCII letters = base-62 numbers.
- Ease typing and reading as much as possible: the number should be composed of no more than four or five characters easily memorized as a whole like
aGty3
. If longer, split the ID using hyphens (and underscores that could be difficult to read when used as an hyperlink). - Make sure that these values can be easily pasted using a single command into clearly separated text fields.
How To Prevent And Detect Typos?
- Exclude confusing characters. Keep in mind that the similarity depends as well on the fonts used: an 'l' can be easily distinguished from a '1' when using a plain old monotype font but less when using a sans-serif one. We advise excluding the most problematic cases: 'O' and '0' (zero), 'Z' and '2' or 'l' and '1'. By dropping these characters, we now deal with base-56 numbers.
- Reserve some bits as a CRC or checksum in order to detect most typos early on the frontend. Such systems are used by banks for decades on IBAN accounts for instance (using the MOD97 algorithm). Users will thank you for notifying them early and this GUI-side surface control prevents issuing some useless server-side queries and ugly error logs on the backend.
NOTE: Some light CRC solutions can’t detect all but most of the possible typos.
What About The Security?
- If these human-readable IDs are used in serious matters dealing with money, security, or official documents, make sure to use a cryptographically secure pseudorandom number generator (CSPRNG) to generate the numbers that you will then convert to your base-56 number. For instance, when using a Linux server, make sure to use
/dev/random
and not/dev/urandom
. This will greatly reduce the risk of collisions (the fact of generating twice the same value in a short amount of time). - The ID length should be proportional to the required difficulty to guess it.
Some Examples Please
Imagine you want only want to avoid '0'/'O' and '1'/'l' confusions and you want to generate ID with a collision risk as low as 1/2,6.10¹⁷, you can generate numbers (using a CSPRNG) like:
aTy2-5fTk-rp9z
or
bUD5-64kP-hlA4
For less critical use cases, fewer characters may be enough:
aTy2-5fTk
or
64kP-hlA4
For short-live and low-risk ID, see what SNCF does for travel files (only six capital letters):
XSDTGE
Conclusion
Generating readable random IDs for humans can be easily achieved, but a bunch of requirements must be taken into account. Their scheme has to vary according to the targeted usage but keep in mind that changing an existing scheme is cumbersome and can require maintaining several ID schemes for a long time. I hope that this article will help you to think about the not-so-obvious criteria making it easier to design them right at the first attempt. I would be glad to get feedback if I have forgotten important or obvious points.
Published at DZone with permission of Bertrand Florat. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments