17. Juli 2014

by

In: Tech

Kommentare deaktiviert für Perl: Semantic Version Sorting via callback puts betas before releases (empty string after text)

Semantic Versioning

Semantic program versions are a great help in administration life: When done right, they help you identify if only bugs have been resolved (2.11.z) or features added (2.y.0) or the program has undergone big changes with chances that an upgrade needs a lot of admin intervention (x.0.0). For developers and early testers, additional suffixes identify alpha (early testing), beta (testing) and Release Candidate (pre-release polishing) versions which are not intended for production use.

The Problem

Sorting program versions is not exactly trivial. You cannot sort them as strings, otherwise you would put 2.2.2 after 2.12.1. You cannot sort them as numbers either, because they contain multiple dots and possibly alpha characters.

Solutions:

a) Fixed-Length Score string

A traditional method expands the version string to a fixed-format score integer or string which can be safely compared and sorted by basic sorting algorithms. This is based on the assumption that there are not more than a fixed number of values per field, i.e. not more than 99 patch versions before the next minor, not more than 99 minor versions before the next major. This is safe for most mainstream projects but corner cases may exist. It also doesn’t play well with development releases.

This solution would first split the version string into its components, then merge them to a string and then run an alphabetic sort on it:

my @versions;
foreach my $version_string (@version_strings) {
  my ($major, $minor, $patch) = split /\./, $version_string;
  push(@versions, sprintf(%02d%02d%02d, $major, $minor, $patch));
}
my @sorted_versions = sort {$a cmp $b} @versions;

This is a simplified example. You need to get back to the original string format, either by a function for reverse conversion or by storing both together in an array of hashref. I also left out handling for alpha, beta, RC versions.

b) Using a sort callback

The sort() routine allows you to specify a callback which decides for any two values $a and $b if they are equal or which is greater. This works by any criteria you can imagine and is also usable for objects and hashes.

This is what we do: We split the version string into a hashref of its components (and a key for the original format, for convenience) and then we define a callback.

sub release_to_version_hash {
  my $release = shift;
  my ($package, $major, $minor, $patch, $dev) = $release =~ /\/(\w+)-(\d+)\.(\d+)\.(\d+)(\w*)/;
  return { 
    major => $major,
    minor => $minor,
    patch => $patch,
    dev => $dev,
    pkg => $package,
    url => $release,
    string => sprintf("%d.%d.%d%s", $major, $minor, $patch, $dev)
  };
}

We use a regular expression to retrieve the version fields.
Note: This code is taken from an open source project of mine.
The release strings have this format:
http://pear.horde.org/get/Horde_ActiveSync-2.16.11.tgz
http://pear.horde.org/get/Horde_ActiveSync-2.4.0RC2.tgz
http://pear.horde.org/get/Horde_ActiveSync-2.3.1.tgz
If your source string is just a version, your regular expression would look like this:

/(\d+)\.(\d+)\.(\d+)(\w*)/

Now let’s get down to business. The actual sorting routine is not very exciting. It gets an Array Reference of Hash References, the hashes being in the format produced above. It returns an array reference (note the [] brackets). The sort routine gets the list (ref) of releases, dereferenced to an array and invokes the compare_versions routine many times, comparing any two values $a and $b inside the releases list which is the „bigger“ one.

sub sort_releases {
  my $releases = shift;
  return [sort compare_versions @$releases];
}

Now let’s look at that callback.

## A compare function callback for version hashes, suitable for sort
## returns -1, 0 or 1
sub compare_versions {
  return
    $a->{major} <=> $b->{major} or
    $a->{minor} <=> $b->{minor} or
    $a->{patch} <=> $b->{patch} or
    ## handle development releases
    ( !$a->{dev} && $b->{dev} ? 1 : 0 ) or
    ( !$b->{dev} && $a->{dev} ? -1 : 0 ) or
    lc($a->{dev}) cmp lc($b->{dev})
}

Now what happens here? This routine returns -1, 0 or 1 depending on which versions is „bigger“. In practice, you should never get 0 because you should never have two identic version strings. After the major versions are compared as a number via the spaceship operator (<=>), perl evaluates the result.
If version $a (major) is smaller (-1) or bigger (1) than version $b (major), then this evaluates to a true (non-zero) value. This fulfills the „or“ condition, so perl immediately stops evaluating and returns the value. If the major versions are equal, the first expression evaluates to false (0) and the next part of the comparison is evaluated. This is sufficient for sorting major, minor, patch version in that order.

If we also want to handle development releases, we also need the last three lines.
The last line sorts development releases alphabetically, sorting all alphas of the same major.minor.patch before all betas. Normally, perl would sort all empty strings before all non-empty strings, which is bad. It would sort a production version 2.0.0 before the alpha releases.

To fix this, we use the upper two lines:
If a string is empty, it evaluates to false.
So if the first string is false (empty, production) and
the second string is true (any characters, development), the first version is the preferable version. In this case the expression evaluates to 1 and returns. Otherwise it evaluates to false (0) and the next or clause is evaluated. This one checks for the opposite case where version b is the better version and returns -1. If both versions contain a dev string, we skip to alphabetic comparison.

This results in a list where the best version is the last hash entry. Getting the best version is as easy as

$version = pop @versions;
## or
$version = $versions[-1];

Bonus: A versatile filter for all sorts of tasks

This filter is best applied before sorting but it also works afterwards. It can filter out all development releases or return only releases within a specific major version etc:

sub filter_releases {
  my $releases = shift;
  my $filter = shift || 
    { stable => 1, 
      rc => 0, 
      beta => 0, 
      alpha => 0, 
      major => 0, 
      minor => 0, 
      patch => 0, 
      pkg => '' };
  my @legit;
  foreach my $pkg (@$releases) {
    ## filters
    next if ($filter->{pkg} && $pkg->{pkg} ne $filter->{pkg});
    next if ($pkg->{dev} =~ /RC/ && $filter->{'rc'} == 0);
    next if ($pkg->{dev} =~ /alpha/ && $filter->{'rc'} == 0);
    next if ($pkg->{dev} =~ /beta/ && $filter->{'rc'} == 0);
    next if ($filter->{major} && $filter->{major} != $pkg->{major});
    next if ($filter->{minor} && $filter->{minor} != $pkg->{minor});
    next if ($filter->{patch} && $filter->{minor} != $pkg->{patch});
    push @legit, $pkg;
  }
  return \@legit;
}

Tags: , , , , , , ,