Over a million developers have joined DZone.

A Quick Measure of Sortedness

DZone's Guide to

A Quick Measure of Sortedness

· Java Zone
Free Resource

Learn how our document data model can map directly to how you program your app, and native database features like secondary indexes, geospatial and text search give you full access to your data. Brought to you in partnership with MongoDB.

How do you measure the "sortedness" of a list? There are several ways. In the literature this measure is called the "distance to monotonicity" or the "measure of disorder" depending on who you read. It is still an active area of research when items are presented to the algorithm one at a time. In this article, I consider the simpler case where you can look at all of the items at once.

The Kendall distance between two lists is the number of swaps it would take to turn one list into another. So, for [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] and [10, 1, 2, 3, 4, 5, 6, 7, 8, 9], it would take nine swaps.

Edit distance is another method. We could take the 10, and move it after the 9, in one operation. The edit distance is inversely related to the longest increasing subsequence. In the list [1, 2, 3, 5, 4, 6, 7, 9, 8], the longest increasing subsequence is [1, 2, 3, 5, 6, 7, 9], of length seven, and it is three away from being a sorted list. The longest increasing subsequence can be calculated in O(nlogn) time. A drawback of this method is its large granularity. For a list of ten elements, the measure can only take the distinct values 0 through 9.

Here, I propose another measure for sortedness. The procedure is to sum the difference between the position of each element in the sorted list, x, and where it ends up in the unsorted list, f(x). We divide by the square of the length of the list and multiply by two, because this gives us a nice number between 0 and 1. Subtracting from 1 makes it range from 0, for completely unsorted, to 1, for completely sorted.

A simple genetic algorithm in python for sorting a list using the above fitness function is presented below.

import random

def procreate(A):
    A = A[:]
    first = random.randint(0, len(A) - 1)
    second = random.randint(0, len(A) - 1)
    A[first], A[second] = A[second], A[first]
    return A

def score(A):
    diff = 0.
    for index, element in enumerate(A):
        diff += abs(index - element)

    return 1.0 - diff / len(A) ** 2 * 2

def genetic(root, procreateFn, scoreFn, generations = 1000, children=6):
    maxScore = 0.
    for i in range(generations):
        print("Generation {0}: {1} {2}".format(i, maxScore, root))
        maxChild = None
        for j in range(children):
            child = procreate(root)
            score = scoreFn(child)
            print("    child score {0:.2f}: {1}".format(score, child))
            if maxScore < score:
                maxChild = child
                maxScore = score
        if maxChild:
            root = maxChild
    return root

A = [a for a in range(10)]
genetic(A, procreate, score)

Note that under this metric, the completely reversed list does not have a score of 0.

The Spearman's coefficient, mentioned in the comments, might be what you are looking for.

Discover when your data grows or your application performance demands increase, MongoDB Atlas allows you to scale out your deployment with an automated sharding process that ensures zero application downtime. Brought to you in partnership with MongoDB.


Published at DZone with permission of Steve Hanov, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.


Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.


{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}