Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Making Code Faster: The Obvious Costs

DZone's Guide to

Making Code Faster: The Obvious Costs

If you don’t get what is going on, you won't get the best out of your code. Even if you do, trying a “minimum change of the code” won't do as much for performance.

· Performance Zone
Free Resource

In my previous post, I presented a small code sample and asked how we could improve its performance. Note that this code sample has been quite maliciously designed to be:

  • Very small.

  • Clear in what it is doing.

  • The most obvious way to do it.

  • Highly inefficient.

  • Misleading people into nonoptimal optimization paths.

In other words, if you don’t get what is going on, you’ll not be able to get the best out of it. Even if you do, it is likely that you’ll try to go in a “minimum change of the code” that isn’t going to be doing as much for performance.

Let's take look at the code again:

// code

var summary = from line in File.ReadAllLines(args[0])
     let record = new Record(line)
     group record by record.Id
     into g
     select new
     {
         Id = g.Key,
         Duration = TimeSpan.FromTicks(g.Sum(r => r.Duration.Ticks))
     };

 using (var output = File.CreateText("summary.txt"))
 {
     foreach (var entry in summary)
     {
         output.WriteLine($"{entry.Id:D10} {entry.Duration:c}");
     }
 }

// data class

  public class Record
  {
      public DateTime Start => DateTime.Parse(_line.Split(' ')[0]);

      public DateTime End => DateTime.Parse(_line.Split(' ')[1]);
      public long Id => long.Parse(_line.Split(' ')[2]);

      public TimeSpan Duration => End - Start;

      private readonly string _line;

      public Record(string line)
      {
          _line = line;
      }
  }

The most obvious optimization is that we are calling _line.Split() multiple times inside of the Record class. Let us fix that:

public class Record
{
    public DateTime Start;
    public DateTime End;
    public long Id;

    public TimeSpan Duration => End - Start;

    public Record(string line)
    {
        var parts = line.Split(' ');
        Start = DateTime.Parse(parts[0]);
        End = DateTime.Parse(parts[1]);
        Id = long.Parse(parts[2]);
    }
}

This trivial change reduces the runtime by about five seconds and saves us 4.2 GB of allocations. The peak working set increased by about 100 MB, which I assume is because the Record class moving from having a single 8-bytes field to having three 8-bytes field.

The next change is also pretty trivial. Let's remove the File.ReadAllLines() in favor of calling File.ReadLines(). This, surprisingly enough, has had very little impact on performance.

However, the allocations dropped by 100 MB and the working set dropped to 280 MB, very much near the size of the file itself.

This is because we no longer have to read the file into an array and hold on to this array for the duration of the program. Instead, we can collect the garbage from the lines very efficiently.

This concludes the obvious stuff, and we managed to gain a whole five seconds of performance improvement here. However, we can do better, and it is sort of obvious, so I’ll put it in this post.

As written, this code is single-threaded. While we are reading from a file, we are still pretty much CPU-bound, so why not use all the cores that we have?

var summary =
     from line in File.ReadLines(args[0]).AsParallel()
     let record = new Record(line)
     group record by record.Id
     into g
     select new
     {
         Id = g.Key,
         Duration = TimeSpan.FromTicks(g.Sum(r => r.Duration.Ticks))
     };

As you can see, all that we have to do was add AsParallel(), and the TPL will take care of it for us.

This gives us a runtime of nine seconds. Allocations are a bit higher (3.45GB up from 3.3 GB), but the peak working set exceeded 1.1GB, which makes a lot of sense.

Now we are now standing at one-third of the initial performance, which is excellent; but can we do more? We’ll cover that in the next post.

Topics:
refactoring ,performance ,code

Published at DZone with permission of Oren Eini, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}