Permutation Distance
Join the DZone community and get the full member experience.
Join For FreeIf you have two sequences, and one is a permutation of the other, how do you measure how far apart the two sequences are?
This post was motivated by the previous post on anagram statistics. For a certain dictionary, the code used in that post found that the longest anagram pairs were the following:
certifications, rectifications
impressiveness, permissiveness
teaspoonsful, teaspoonfuls
Anagrams are more amusing when the two words are more dissimilar, at least in my opinion.
There are several ways to measure (dis)similarity. The words in the first two pairs above are dissimilar in meaning, but the words in the last pair are synonyms [1]. The pairs of words above are not that dissimilar as character strings.
It would be hard to quantify how dissimilar two words are in meaning, but easier to quantify how dissimilar they are as sequences of characters. We’ll look at two approaches: Hamming distance and transposition distance.
You may also like: Mathematical Functions and Converting Data Types in Groovy.
Hamming distance
The Hamming distance between two words is the number of positions in which they differ. We can see the Hamming distances between the words above by viewing them in a fixedwidth font.
certifications
rectifications
impressiveness
permissiveness
teaspoonsful
teaspoonfuls
The Hamming distances above are 2, 5, and 4.
The anagrams with a maximum Hamming distance in my dictionary are “imperfections” and “perfectionism.” (There’s something poetic about these words being anagrams.) The Hamming distance is 13 because these 13letter words differ in every position.
imperfections
perfectionism
Transposition Distance
Another way to measure the distance between two permutations is how many elements would have to be swapped to turn one list into the other. This is called the transposition distance. For example, the transposition distance between “certifications” and “rectifications” is 1 because you only need to swap the first and third characters to turn one into the other.
It’s not so obvious what the transposition distance is between “impressiveness” and “permissiveness.” We can find an upper bound on the distance by experimentation. The two words only differ in the first five letters, so the distance between “impressiveness” and “permissiveness” is no larger than the distance between “impre” and “permi.” The latter pair of words are no more than 4 transpositions apart, as the following sequence shows:
impre
pmire
peirm
perim
permi
But is there a shorter sequence of transpositions? Would it help if we used the rest of the letters “ssiveness” as the working space?
Computing transposition distance is hard, as in NPhard. This is unfortunate since transposition has more practical applications than just word games. It is used, for example, in genetic research. It may be practical to compute short sequences, but, in general, one must rely on bounds and approximations.
Note that transposition distance allows swapping any two elements. If we only allowed swapping consecutive elements, the problem is much easier, but the results are not the same. When restricted to consecutive swaps, the distance between “certifications” and “rectifications” is 3 rather than 1. We can swap the “c” and the “r” to turn “cer” into “rec,” so we have to do something like
cer
ecr
erc
rec
I don’t know a name for the distance where you allow rotations. The words “teaspoonsful” and “teaspoonfuls” differ by one rotation, turning “sful” into “fuls.” While this can be done in one rotation, it would take several swaps.
Further Reading
 Levenshtein distance.
 Translating Robert Burns.
 MachineX: Cosine Similarity for ItemBased Collaborative Filtering.
 Introduction to Classification Algorithms.
[1] Although “teaspoonfuls” is far more common, I remember a schoolmarm teaching us that this is incorrect.
And I still recoil at “brother in laws” and say “brothers in law” instead; the majority is on my side on this one.
Published at DZone with permission of John Cook, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Trending

A Complete Guide to AWS File Handling and How It Is Revolutionizing Cloud Storage

Observability Architecture: Financial Payments Introduction

RBAC With API Gateway and Open Policy Agent (OPA)

How to LINQ Between Java and SQL With JPAStreamer
Comments