Over a million developers have joined DZone.

Collision Based Hashing Algorithm Disclosure

DZone 's Guide to

Collision Based Hashing Algorithm Disclosure

In February 2017, a number of Google Engineers created the first SHA-1 collision. Tough this hashing algorithm was marked as deprecated in 2011, it is still widely used.

· Security Zone ·
Free Resource

In February 2017, a number of Google Engineers created the first SHA-1 collision. Even though this hashing algorithm was marked as deprecated by NIST in 2011, it is still widely used.

What Are Hash Collisions?

A hash collision happens when two different cleartext values produce the same hash value. Collisions can lead to a wide range of problems, but we won't cover them within this article.

Instead, in this blog post, we will take a look at another side effect of collisions; a method that allows you to detect whether or not a website uses weak hash functions. This can be done without having access to the source code.

To make it easy to remember we are referring to this method as Collision Based Hashing Algorithm Disclosure.

Example of a Hash Collision

The collision the Google engineers identified allows anybody to create two PDF files, with different content but with the same hash. Let's take a look at both of the cleartext values Google used:



Those two strings are hex encoded, and once you decode them both will result in the same SHA-1 sum once hashed:


Introducing Collision Based Hashing Algorithm Disclosure

Hashing Algorithms in Web Applications

When you register to an online service, the majority of websites will hash your password and store the hash in the database. This is good practice since it allows the web application to store your password in a form that doesn't allow a potential attacker to view it in plain text, should he gain access. However, to be effective, a strong hashing algorithm has to be used. This means that algorithms like SHA-1 and MD5 are not suitable for that kind of application. Nonetheless, nowadays they are often used by developers to hash passwords.

When you try to login to the website, the password hash stored in the database is compared to the hash generated on the fly from the password you submit in the login form. Therefore, if the target web application uses an SHA-1 hashing algorithm, and we supply our collision strings, the hash will be the same. This also means that we can login using two different strings/passwords.

By using the same technique in a black box fashion, we can determine whether or not a web application uses a vulnerable hashing algorithm, as explained below.

How Does the Collision Based Hashing Algorithm Disclosure Work?

The Theory Behind the Attack

In theory, it is very simple. Create an account on the web application that you would like to test. As a password, use a string that produces the same hash as another, different string. Once the account is registered, try to log in again. This time supply the different string that produces the same hash as the password. If you manage to log in, it means that the target web application uses the SHA-1 algorithm.

Example of Collision Based Hashing Algorithm Disclosure

Let's assume that when you register a new user on a web application, it uses the cleartext1 string that you supplied as password and hashes it. As seen below, the hashed password would result in the hash abcd (simplified), which is then stored in the database:

hash(cleartext1) == 'abcd'

Note: To keep things simple we will not take salts, a random string that's concatenated with the password to make it more secure against certain types of attacks, into consideration.

The web application stores the hash generated from that process in the database. When you try to log back into the web application, the same hashing algorithm is applied to the password you supply in the login form. This hash is then compared to the one in the database and if they match you will log in.

So let's assume that to log in now you used cleartext2 as the password, and when the web application hashes it using the SHA-1 algorithm, the same hash is produced:

hash(cleartext2) == 'abcd'

When the web application compares the hashed password with the hash that was stored in the database, they will match and you will log in.

hash(password) == dbhash

However, this method has a few limitations and it won't work if:

  • Strict server-side password length restrictions are used in the registration and login forms, for example, a maximum password length of 20 characters is enforced.
  • If there is a whitelist of allowed characters.
  • If there is a salt prepended (not appended).

How to Check if a Web Application Uses an SHA-1 Hashing Algorithm

Below is a step-by-step explanation of how you can check if a web application uses the SHA-1 hashing algorithm.

  1. Setup an interception proxy, such as the one in Netsparker Desktop, and configure the web browser to proxy the requests through it.
  2. Register an account on the web application and use a recognizable password such as !!PASS!! so it is easy to find when you intercept the HTTP request.
  3. Edit the registration request in the interception proxy by replacing all occurrences of the !!PASS!! string with the first collision string (converted to URL encoding) from the above example.

NOTE: To URL encode the collision strings you have to place a % character in front of every encoded byte. You can use the below PHP code to do it:

implode(array_map(function($byte){ return '%' . $byte;},str_split($collisionstring,2)));
  1. Once you send the request to the web application, it will generate a hash and store it in the database. If the web application uses the SHA-1 algorithm the hash will be f92d74e3874587aaf443d1db961d4e26dde13e9c
  2. Now try to login to the web application using the !!PASS!! string as password again.
  3. Intercept the login HTTP request and replace all occurrences of !!PASS!! with the URL encoded version of the second string.
  4. The web application will hash your supplied password and compare it to the stored value in the database. Once again, the hash should be f92d74e3874587aaf443d1db961d4e26dde13e9c

If the web application uses the SHA-1 hashing algorithm, even though you supplied a different value, you will log in.


If you do not manage to log in because the passwords do not match, then the web application uses a hashing algorithm other than SHA-1.

If you manage to log in to the web application, it means that the SHA-1 hashing algorithm is used for password hashing.

Does This Collision Based Hashing Algorithm Disclosure Work for SHA-1 Algorithm Only?

This method will also work with other hashing algorithms that have known collisions, for example, MD5. The prerequisites don't differ much. However, the length restriction is less of a concern as the known MD5 collisions are not as long. They are just 64 bytes. This might still be too long for some server-side filtering, though.

Here is a known MD5 collision which you can use for testing:



Both strings will result in the following hash:


Who Needs to Know about the Collision Based Hashing Algorithm Disclosure?

As a developer of a website you already know which hashing algorithm you use and do not need this test to see if your algorithm is secure or not. Just knowing which hashing algorithm is used also won't aid an attacker during an attack.

However, there are two scenarios where this is especially useful: during a black box penetration test, where it is not possible to get a look at the source code, and as an additional step to check the authenticity of a database dump.

If a leaked database uses unsalted SHA-1 hashes and this method confirms that indeed SHA-1 is the hashing algorithm used by the website, it can be a very small indicator that the dump might be credible.

The URL Encoded Strings

For easy copying, here are the URL encoded strings for the above check:


String 1


String 2



String 1


String 2


hash collision ,collision based hashing algorithm disclosure ,hashing algorithm ,security ,web application security

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}