Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Generating MD5 and SHA1 checksums for a file

DZone's Guide to

Generating MD5 and SHA1 checksums for a file

·
Free Resource

MD5 (128 bit) and SHA1 (160 bit) are cryptographic hash functions used to encrypt information by generating a hash based on the passed byte structure.

Although MD5 is not a very secure hashing algorithm, since it is vulnerable to collision attacks, it is still widely used to check the file integrity. It was demonstrated that it is possible to create two different files that will have the same MD5 hash. SHA1 is a much more secure hashing algorithm, although its principles are somewhat based on those of MD5. For general use, both MD5 and SHA1 are very efficient and most likely will be used for a little while more.

In .NET, the System.Security.Cryptography namespace offers access to classes that facilitate both MD5 and SHA1 hash generation. In this article, I am going to create a console application that will generate both hashes for a file that is passed as a command line parameter.

Start by creating a command line application and adding references for the following namespaces:

•    System.Security.Cryptography
•    System.IO
•    System.Threading

IO will be used for file access, since the byte structure will be imported from the passed file. Threading is used to work with ThreadPool, to which I will delegate the process to keep the main thread usable. For a console application, this probably doesn’t play a major role unless you want to do something while the application is processing the data. But as a good practice, I would recommend moving resource-consuming processes to different threads.

I started by creating two string properties inside the class that will store the generated MD5 and SHA1 hashes:

static string MD5Hash { get; set; }
static string SHA1Hash { get; set; }

Since the file name will be passed as a parameter, it’s a good practice to check whether the passed string is an actual filename. Inside the Main method, I am using  a simple if statement to check whether there is a valid parameter passed, since the application can be launched without any parameters or with an invalid path:

if ((args.Count() != 0) && (File.Exists(args[0])))
{

}
else
{
Console.WriteLine("The file cannot be found.");
Environment.Exit(0);
}

 

There are two separate methods that generate the hashes – ComputeMD5Hash and ComputeSHA1Hash. Since each of them will be launched in a separate thread, one of them should be the “starter” – it will be the first one to open the file. In my case, this will be ComputeMD5Hash:

public static void ComputeMD5Hash(object filePath)
{
using (var stream = new FileStream((string)filePath, FileMode.Open, FileAccess.Read, FileShare.Read))
{
using (var md5gen = new MD5CryptoServiceProvider())
{
md5gen.ComputeHash(stream);
Program.MD5Hash = BitConverter.ToString(md5gen.Hash).Replace("-", "").ToLower();
}
}
}

You can see here that the file path is passed as an object and this can be a bit confusing. When invoking the method via ThreadPool (and I will do that later on), I need to pass an object as the state (passing data to the method). Therefore, initially I am getting an object parameter and then converting it to a string.

When declaring a new instance of FileStream, I am using FileShare.Read. This means that once this instance of FileStream will open the file, I can still access the file for reading from another instance (or process).

The hash generation is done via the MD5CryptoServiceProvider, to be specific – by calling the ComputeHash method and passing the FileStream as a parameter (or a bit array). The Hash property exposes the actual hash in byte format (as a byte array). To convert it to a string, BitConverter.ToString is used.

By default, the hash comes with separators (-) and in my code I am replacing them with empty strings. You can leave those, but generally separators are deleted from the final result.

An important thing to be mentioned

Initially, I thought about using File.ReadAllBytes() to get the byte contents for the specified file. However, this method has its limitations. For example, it cannot work with files more than 2GB, therefore, if you, for example, pass an ISO image to the program that is more than 2GB, an exception will be thrown. FileStream is the appropriate solution here.

The method to generate the SHA1 hash is similar to the one above, however it is using SHA1CryptoServiceProvider to generate the hash.

public static void ComputeSHA1Hash(object filePath)
{
using (var stream = new FileStream((string)filePath, FileMode.Open, FileAccess.Read))
{
using (var sha1gen = new SHA1CryptoServiceProvider())
{
sha1gen.ComputeHash(stream);
Program.SHA1Hash = BitConverter.ToString(sha1gen.Hash).Replace("-", "").ToLower();
}
}
}

Notice that I am not specifying the FileShare type here – I am only reading the file. Remember that it all depends on the method sequences – if I would run the SHA1 generation method (ComputeSHA1Hash) first then I would have to specify the FileShare type here and omit it in the MD5 generation method (ComputeMD5Hash).

The actual Main method looks like this:

static void Main(string[] args)
{
if ((args.Count() != 0) && (File.Exists(args[0])))
{
string path = args[0];

ThreadPool.QueueUserWorkItem(new WaitCallback(ComputeMD5Hash), path);
ThreadPool.QueueUserWorkItem(new WaitCallback(ComputeSHA1Hash), path);

while ((MD5Hash == null) || (SHA1Hash == null))
Thread.Sleep(1000);

Console.WriteLine("MD5 Hash: " + MD5Hash);
Console.WriteLine("SHA1 Hash: " + SHA1Hash);

Console.ReadKey();
}
else
{
Console.WriteLine("The file cannot be found.");
Environment.Exit(0);
}
}

Here, I am passing the first command line parameter (with index zero) as the path. Then, I am putting two work items in the ThreadPool queue – the MD5 generation method and after that the SHA1 generation method, passing the path as the state. Those will run simultaneously in separate threads, as long as there are available threads.

While the properties don’t have a value, the application does nothing. You can modify this part to make it do something useful (if you want to, of course), but in my case it will just sleep.

As simple as that. Now you can easily check the integrity of the files you download over the Internet or get from another source, in case you know the original checksum. Launch the application from the command line and pass it any file on your local disk. Once processed (depending on the size it might take a while), you will see the hashes on the screen.

Topics:

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}