Over a million developers have joined DZone.

Algorithm of the Week: Damn Cool Secure Permutations with Block Ciphers

· Big Data Zone

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

To start, I'm assuming you know what a permutation is - basically a shuffling of a sequence of items in a particular order. A permutation of the range 1-10, for example, is {5,2,1,6,8,4,3,9,7,10}. A secure permutation is one in which an attacker, given any subset of the permutation, cannot determine the order of any other elements. A simple example of this would be to take a cryptographically secure pseudo-random number generator, seed it with a secret key, and use it to shuffle your sequence. 

What if you want to generate a really, really big permutation - one so big precomputing and storing it isn't practical or desirable? Furthermore, what if you want it to be a secure permutation? There's a really neat trick we can pull with block ciphers that allows us to generate a secure permutation over any range of numbers without first having to precompute it. 

block cipher, for anyone that isn't familiar with them, is a common cryptographic primitive. It takes blocks of ciphertext of some fixed lengths - 64 or 128 bits is common - and encrypts it. Given the same key and the same block of plaintext, it will always generate the same block of ciphertext. Messages larger than a single block are encrypted using one of a number of modes of operation, allowing messages much larger than a single block to be encrypted and decrypted securely. When using a block cipher for encryption, choice of the mode of operation is critical. For the purposes of generating a secure permutation, however, we're only going to be encrypting a single block at a time, so we don't have to worry about modes of operation. 

If you look at how a block cipher operates - taking any block of a given length (think of blocks as very large numbers here) and converting it uniquely to another block, such that it can be converted back again, a block cipher is already a secure permutation. If we progressively encrypt larger numbers (1, 2, 3, and so on), we get out a random seeming sequence of output numbers that is guaranteed not to repeat as long as we don't repeat our input. It's easy to prove this to yourself: If it repeated, then you would have two input numbers with a single output number, and it would thus be impossible to decrypt uniquely. So the same properties that a block cipher requires are the properties that make it useful to us. 

All very well, you say, but what if I want a permutation over a range that isn't a power of two? This is where the clever trick comes in. Take a block cipher that's got a block length slightly larger than you want. Use it as described above, encrypting progressively higher numbers in a sequence to generate elements in the permutation. Whenever the output of the encryption is outside the range you want for your permutation, just encrypt it again. Repeat until you get a number within the range you want. Again, we're guaranteed uniqueness by the block cipher, and we're also guaranteed (by exhaustion) that we will eventually get a number within the desired range. 

Obviously, there are some factors that need to be taken into consideration before pursuing this route. You want to select a block cipher that is not much larger than the range you wish to generate a permutation over - preferably the next power of two. The ratio of the cipher's range to the permutation's range defines the average amount of work you will have to perform, so if the cipher has a range four times that of your permutation, you'll need to do an average of four encryptions for each value. Since most block ciphers are 64, 128, or more bits, this can be problematic. For this purpose, I've found the TEA cipher to be particularly adaptable. It is easy to create variants that are 32, 64, 128 or more bits long, and from there, the bitshifts in the main loop are easily adjusted to produce a cipher with a length that's any power of 4, without needing to shorten the key to the point where it's easily brute-forced. 

It's also worth noting that although this technique is aimed at generating very large secure permutations, it is equally useful for a permutation that doesn't need to be secure or secret - your secret key simply becomes your random seed for the permutation. There are many situations in which this can be useful - what you essentially have is a mapping function from index to permutation value, so you can calculate the value of any subset of the permutation that you wish. 

Finally, bear in mind that due to the factorial explosion of the number of possible permutations, the keyspace of your cipher is almost certainly going to be much smaller than the number of possible permutations. For most purposes this probably does not matter, since the number of possible permutations is too large to enumerate anyway, but if your key is sufficiently short, it allows the possibility of an attacker doing an exhaustive search of your keyspace to find the permutation that generates the subsequence of the permutation he has access to. 

Update: Yossi Oren points to this excellent paper in the comments. It covers exactly what I describe here (only much more comprehensively, of course).

Hortonworks Sandbox is a personal, portable Apache Hadoop® environment that comes with dozens of interactive Hadoop and it's ecosystem tutorials and the most exciting developments from the latest HDP distribution, brought to you in partnership with Hortonworks.


Published at DZone with permission of Nick Johnson .

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}