Over a million developers have joined DZone.

A Quick Measure of Sortedness

· Java Zone

Discover how AppDynamics steps in to upgrade your performance game and prevent your enterprise from these top 10 Java performance problems, brought to you in partnership with AppDynamics.

How do you measure the "sortedness" of a list? There are several ways. In the literature this measure is called the "distance to monotonicity" or the "measure of disorder" depending on who you read. It is still an active area of research when items are presented to the algorithm one at a time. In this article, I consider the simpler case where you can look at all of the items at once.

The Kendall distance between two lists is the number of swaps it would take to turn one list into another. So, for [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] and [10, 1, 2, 3, 4, 5, 6, 7, 8, 9], it would take nine swaps.

Edit distance is another method. We could take the 10, and move it after the 9, in one operation. The edit distance is inversely related to the longest increasing subsequence. In the list [1, 2, 3, 5, 4, 6, 7, 9, 8], the longest increasing subsequence is [1, 2, 3, 5, 6, 7, 9], of length seven, and it is three away from being a sorted list. The longest increasing subsequence can be calculated in O(nlogn) time. A drawback of this method is its large granularity. For a list of ten elements, the measure can only take the distinct values 0 through 9.

Here, I propose another measure for sortedness. The procedure is to sum the difference between the position of each element in the sorted list, x, and where it ends up in the unsorted list, f(x). We divide by the square of the length of the list and multiply by two, because this gives us a nice number between 0 and 1. Subtracting from 1 makes it range from 0, for completely unsorted, to 1, for completely sorted.

A simple genetic algorithm in python for sorting a list using the above fitness function is presented below.

import random

def procreate(A):
    A = A[:]
    first = random.randint(0, len(A) - 1)
    second = random.randint(0, len(A) - 1)
    A[first], A[second] = A[second], A[first]
    return A

def score(A):
    diff = 0.
    for index, element in enumerate(A):
        diff += abs(index - element)

    return 1.0 - diff / len(A) ** 2 * 2

def genetic(root, procreateFn, scoreFn, generations = 1000, children=6):
    maxScore = 0.
    for i in range(generations):
        print("Generation {0}: {1} {2}".format(i, maxScore, root))
        maxChild = None
        for j in range(children):
            child = procreate(root)
            score = scoreFn(child)
            print("    child score {0:.2f}: {1}".format(score, child))
            if maxScore < score:
                maxChild = child
                maxScore = score
        if maxChild:
            root = maxChild
    return root

A = [a for a in range(10)]
genetic(A, procreate, score)

Note that under this metric, the completely reversed list does not have a score of 0.

The Spearman's coefficient, mentioned in the comments, might be what you are looking for.

The Java Zone is brought to you in partnership with AppDynamics. AppDynamics helps you gain the fundamentals behind application performance, and implement best practices so you can proactively analyze and act on performance problems as they arise, and more specifically with your Java applications. Start a Free Trial.


Published at DZone with permission of Steve Hanov, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}