DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • What Is Encryption and How Does It Work?
  • Dodge Adversarial AI Attacks Before It's Too Late!
  • The Quantum Computing Mirage: What Three Years of Broken Promises Have Taught Me
  • Securing AI/ML Workloads in the Cloud: Integrating DevSecOps with MLOps

Trending

  • Top JavaScript/TypeScript Gen AI Frameworks for 2026
  • Ujorm3: A New Lightweight ORM for JavaBeans and Records
  • Building an Image Classification Pipeline With Apache Camel and Deep Java Library (DJL)
  • Multi-Scale Feature Learning in CNN and U-Net Architectures
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Please Pick a Number

Please Pick a Number

It may seem that the issue of ensuring the proper quality of random data doesn’t concern us unless we directly deal with cryptography, but it is not necessarily the case.

By 
Krzysztof Atlasik user avatar
Krzysztof Atlasik
·
Mar. 03, 23 · Analysis
Likes (1)
Comment
Save
Tweet
Share
4.4K Views

Join the DZone community and get the full member experience.

Join For Free

On Randomness in Data

Picking a random number might seem to be a no-brainer for us humans. We just close our eyes and say the first number that comes to our minds. But is this really the case for computers? By design, they should be pretty predictable. With the same input data and program, the computer should always yield identical results. 

Yet, randomness is very crucial in IT. For instance, random data is used to produce keys of certificates or access tokens. If the generated content could be predicted by attackers, then the application security would be compromised. 

So how is random data actually generated?

Pseudorandom Numbers

You might have heard about pseudorandom number generators (PRNG). PRNG is a deterministic algorithm that can generate a sequence of numbers that looks like they’re random. 

The PRNG is initialized with the initial value called the seed. PRNG booted up with a certain seed will always yield the same series of numbers. This is sometimes a very handful property, for instance, during unit testing. If some input data generated by PRNG causes test failure, we can deterministically reproduce the error by passing the same seed. 

Numbers derived from PRNGs need to have appropriate statistical properties. In practice, this is not always the case. An example of the shortcoming of the algorithm could be an unwanted correlation or uneven distribution of generated numbers.

In order for the PRNG algorithm to be suitable for cryptographic purposes, it needs to pass restrictive statistical tests. A requirement for a cryptographically-safe pseudorandom generator is that an attacker has only a negligible advantage in distinguishing the generator's output sequence from a truly random sequence.

Still, values generated by such a source are only safe if the adversary doesn’t know the initial value. Using predictable seeds can cause serious security problems. One example of such vulnerability comes from the early days of the Internet. Back in 1994, SSL encryption employed by the Netscape Navigator browser utilized numbers generated by the PRNG. The implementation seeded the algorithm from three sources: an ID of the process, an ID of the parent process, and the current time. These values turned out to be easily guessable. By figuring out the initial value used in PRNG, a potential attacker was able to decrypt the traffic. 

So it seems that in order to safely initialize a random generator, we need a random number first. So how can we get it?

Cryptographically-Safe Random Generators

Interestingly, human activity is a very efficient source of randomness. Mouse movements or keyboard keystrokes happen in irregular intervals, so if measured, they can be the origin of random values. Linux records all user interactions, turns them into bytes, and puts them in the so-called entropy pool. As a rule of thumb, we can say that the more entropy is gathered in the pool, the more robust the random generator becomes.

The random bytes can be later read by accessing the special file /dev/random. If the pool hasn’t gathered sufficient entropy, the read from /dev/random will block. This can happen when the system was just restarted and wasn’t yet able to fill the pool or values are being read faster than they’re produced, and the pool is depleted.

Linux also offers a non-blocking alternative called /dev/urandom (unlimited random), which reuses the internal pool to produce more pseudo-random bits. This means that the read from urandom will not block, but the output may contain less entropy. In theory, it is less safe, but at the time I write this article, there was no practical implementation of the attack. There’s also /dev/arandomwhich only blocks until there’s enough entropy to initialize the seed value and then never blocks again because the following bytes are generated using the PRNG algorithm.

Random values can also be created via hardware sources, so-called hardware random number generators (HRNG), or true random number generators (TRNG). It turns out that it’s just enough to observe physical events that are happening all around us to get random samples of data. It doesn’t matter whether it is the electronic current noise or the jitter of the clock chip; if it’s hard to predict, then it’s a viable source of randomness. So the HRNG’s job is just to measure any of these natural phenomena and fill its internal entropy pool. Intel’s Ivy Bridge processors provide built-in HRNG, which samples the thermal noise of the processor. The values can then be retrieved using RDRAND or RDSEED instructions.

Another amazing example of using measurements of physical events to gather entropy is applied in Cloudflare. They use a set of lava lamps standing on shelves. The camera takes a photo of the wall at a scheduled interval, and then the digitized image is used as a source of random bytes. Since the movement of the fluids inside the lamp is impossible to predict, they tend to be a very efficient source of randomness. Cloudflare’s employees call it a “wall of entropy.”

HRNGs are very useful in environments where it is very little or even no noise coming from users’ interactions, like servers. They tend to be slower than PRNGs, so very often, a value coming from the hardware generator is only used to seed the pseudorandom algorithm, especially when a high volume of random bytes is required. To increase safety, seeds can be rotated every once in a while.

Why Should We Care?

It might seem that the issue of ensuring the proper quality of random data doesn’t concern us unless we directly deal with cryptography, but it is not necessarily the case.

Using safe random generators might be important if you’re generating any value that should not be guessable, like a token for resetting the password. By using weak PRNGs or predictable seeds, you can introduce security vulnerabilities to your web app. For that reason, for such applications, remember to use java.security.SecureRandom in Java, os.urandom() in Python or similar functions in other languages.

Take care!

IT Algorithm security Random number generation

Published at DZone with permission of Krzysztof Atlasik. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • What Is Encryption and How Does It Work?
  • Dodge Adversarial AI Attacks Before It's Too Late!
  • The Quantum Computing Mirage: What Three Years of Broken Promises Have Taught Me
  • Securing AI/ML Workloads in the Cloud: Integrating DevSecOps with MLOps

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook