Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Semantically Ordering Versioned File Names in Java

DZone's Guide to

Semantically Ordering Versioned File Names in Java

Want to sort your file names by version rather than by numeral? Let's see what it takes to semantically order your files with a bit of Java.

· Java Zone ·
Free Resource

Get the Edge with a Professional Java IDE. 30-day free trial.

In most cases, natural sorting by sorting lexicographically is useful as a default in Java. This includes sorting file names, which are sorted lexicographically as well.

However, when we have version numbers in our files (such as a set of SQL migration scripts), then we prefer the files to be sorted in a more intuitive ordering, where the version numbers contained in the string become “semantic”. In the following example, we have a set of versions, once sorted “naturally”, and once “semantically”:

Natural sorting

  • version-1
  • version-10
  • version-10.1
  • version-2
  • version-21


Semantic sorting

  • version-1
  • version-2
  • version-10
  • version-10.1
  • version-21


Semantic ordering, Windows style

The Windows Explorer does this as well, although there’s a slight difference as the “.” character is used to separate filename from ending, so now, we’re comparing a version sub-number (1) with a file ending (sql)…

The JDK doesn’t seem to have a built-in Comparator that implements this ordering, but we can easily roll our own. The idea is simple. We want to split a file name into several chunks, where a chunk is either a string (sorted lexicographically), or an integer number (sorted numerically). We split that file name using a regular expression:

Pattern.compile("(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)");


This expression matches the boundary between string and number, without actually capturing anything, so we can use it for split() operations. The idea was inspired by this stack exchange answer. Here’s the logic of the comparator annotated with comments:

public final class FilenameComparator
implements Comparator<String> {

    private static final Pattern NUMBERS = 
        Pattern.compile("(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)");

    @Override
    public final int compare(String o1, String o2) {

        // Optional "NULLS LAST" semantics:
        if (o1 == null || o2 == null)
            return o1 == null ? o2 == null ? 0 : -1 : 1;

        // Splitting both input strings by the above patterns
        String[] split1 = NUMBERS.split(o1);
        String[] split2 = NUMBERS.split(o2);
        int length = Math.min(split1.length, split2.length);

        // Looping over the individual segments
        for (int i = 0; i < length; i++) {
            char c1 = split1[i].charAt(0);
            char c2 = split2[i].charAt(0);
            int cmp = 0;

            // If both segments start with a digit, sort them
            // numerically using BigInteger to stay safe
            if (c1 >= '0' && c1 <= '9' && c2 >= 0 && c2 <= '9')
                cmp = new BigInteger(split1[i]).compareTo(
                      new BigInteger(split2[i]));

            // If we haven't sorted numerically before, or if
            // numeric sorting yielded equality (e.g 007 and 7)
            // then sort lexicographically
            if (cmp == 0)
                cmp = split1[i].compareTo(split2[i]);

            // Abort once some prefix has unequal ordering
            if (cmp != 0)
                return cmp;
        }

        // If we reach this, then both strings have equally
        // ordered prefixes, but maybe one string is longer than
        // the other (i.e. has more segments)
        return split1.length - split2.length;
    }
}


That’s it. Here’s an example on how to use this:

// Random order
List<String> list = asList(
    "version-10", 
    "version-2", 
    "version-21", 
    "version-1", 
    "version-10.1"
);

// Turn versions into files
List<File> l2 = list
    .stream()
    .map(s -> "C:\\temp\\" + s + ".sql")
    .map(File::new)
    .collect(Collectors.toList());

System.out.println("Natural sorting");
l2.stream()
  .sorted()
  .forEach(System.out::println);

System.out.println();
System.out.println("Semantic sorting");
l2.stream()
  .sorted(Comparator.comparing(
      File::getName, 
      new FilenameComparator()))
  .forEach(System.out::println);


The output is:

Natural sorting
C:\temp\version-1.sql
C:\temp\version-10.1.sql
C:\temp\version-10.sql
C:\temp\version-2.sql
C:\temp\version-21.sql

Semantic sorting
C:\temp\version-1.sql
C:\temp\version-2.sql
C:\temp\version-10.1.sql
C:\temp\version-10.sql
C:\temp\version-21.sql


Again, the algorithm is rather simple as it doesn’t distinguish between file endings and “segments”, so (1) is compared with (sql), which might not be the desired behaviour. This can be easily fixed by recognising actual file endings and excluding them from the comparison logic – at the price of not being able to sort files without file endings… The comparator would then look like this:

public final class FilenameComparator
implements Comparator<String> {

    private static final Pattern NUMBERS = 
        Pattern.compile("(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)");
    private static final Pattern FILE_ENDING =
        Pattern.compile("(?<=.*)(?=\\..*)");

    @Override
    public final int compare(String o1, String o2) {
        if (o1 == null || o2 == null)
            return o1 == null ? o2 == null ? 0 : -1 : 1;

        String[] name1 = FILE_ENDING.split(o1);
        String[] name2 = FILE_ENDING.split(o2);

        String[] split1 = NUMBERS.split(name1[0]);
        String[] split2 = NUMBERS.split(name2[0]);
        int length = Math.min(split1.length, split2.length);

        // Looping over the individual segments
        for (int i = 0; i < length; i++) {
            char c1 = split1[i].charAt(0);
            char c2 = split2[i].charAt(0);
            int cmp = 0;

            if (c1 >= '0' && c1 <= '9' && c2 >= 0 && c2 <= '9')
                cmp = new BigInteger(split1[i]).compareTo(
                      new BigInteger(split2[i]));

            if (cmp == 0)
                cmp = split1[i].compareTo(split2[i]);

            if (cmp != 0)
                return cmp;
        }

        int cmp = split1.length - split2.length;
        if (cmp != 0)
            return cmp;

        cmp = name1.length - name2.length;
        if (cmp != 0)
            return cmp;

        return name1[1].compareTo(name2[1]);
    }
}


The output is now:

C:\temp\version-1.sql
C:\temp\version-2.sql
C:\temp\version-10.sql
C:\temp\version-10.1.sql
C:\temp\version-21.sql


Discussion about a JDK implementation

Tagir Valeev from JetBrains was so kind to point out discussions about adding such an implementation to the JDK:


The discussion is here:


Get the Java IDE that understands code & makes developing enjoyable. Level up your code with IntelliJ IDEA. Download the free trial.

Topics:
java ,semantic ,order ,versioning ,file

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}