Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Making Code Faster: Start From Scratch

DZone's Guide to

Making Code Faster: Start From Scratch

The author continues refactoring and improves performance yet again.

Free Resource

RavenDB vs MongoDB: Which is Better? This White Paper compares the two leading NoSQL Document Databases on 9 features to find out which is the best solution for your next project.  

After introducing the problem and doing some very obvious things, we have managed to get to nine seconds instead of 30. That is pretty awesome, but we can do better.

Let's see what would happen if we were to write it from scratch, sans Linq.

var stats = new Dictionary<long, long>();

foreach (var line in File.ReadLines(args[0]))
{
    var parts = line.Split(' ');
    var duration = (DateTime.Parse(parts[1]) - DateTime.Parse(parts[0])).Ticks;
    var id = long.Parse(parts[2]);

    long existingDuration;
    stats.TryGetValue(id, out existingDuration);
    stats[id] = existingDuration + duration;
}

using (var output = File.CreateText("summary.txt"))
{
    foreach (var entry in stats)
    {
        output.WriteLine($"{entry.Key:D10} {TimeSpan.FromTicks(entry.Value):c}");
    }
}

The code is still pretty small and idiomatic, but not using Linq gave us some interesting numbers: 10.4 seconds to run (so comparable to the parallel Linq), but we also allocated 2.9 GB (down from 3.49 GB) and our peek working set didn’t exceed 30 MB.

Taking the next step and paralleling this approach:

var stats = new ConcurrentDictionary<long, long>();
Parallel.ForEach(File.ReadLines(args[0]), line =>
{
    var parts = line.Split(' ');
    var duration = (DateTime.Parse(parts[1]) - DateTime.Parse(parts[0])).Ticks;
    var id = long.Parse(parts[2]);
    stats.AddOrUpdate(id, duration, (_, existingDuration) => existingDuration + duration);

});

We now have eight seconds, 3.49 GB of allocations and peak working set of 50 MB. That is good, but we can do better.

// code

var stats = new Dictionary<string, FastRecord>();

foreach (var line in File.ReadLines(args[0]))
{
    var parts = line.Split(' ');
    var duration = (DateTime.Parse(parts[1]) - DateTime.Parse(parts[0])).Ticks;

    FastRecord value;
    var idAsStr = parts[2];
    if (stats.TryGetValue(idAsStr, out value) == false)
    {
        stats[idAsStr] = value = new FastRecord
        {
            Id = long.Parse(idAsStr),
        };
    }
    value.DurationInTicks += duration;
}


using (var output = File.CreateText("summary.txt"))
{
    foreach (var entry in stats)
    {
        output.WriteLine($"{entry.Value.Id:D10} {TimeSpan.FromTicks(entry.Value.DurationInTicks):c}");
    }
}



// record class
 public class FastRecord
 {
     public long Id;
     public long DurationInTicks;
 }

Now, instead of using a dictionary of long to long, we’re using a dedicated class, and the key is the string representation of the number. Most of the time, it should save us the need to parse the long. It also means that the number of dictionary operations we need to do is reduced.

This dropped the runtime to 10.2 seconds (compared to 10.4 seconds for the previous single threaded impl). That is good, but this is just the first stage; what I really want to do is save on all those expensive dictionary calls when running in parallel.

Here is the parallel version:

var stats = new ConcurrentDictionary<string, FastRecord>();

Parallel.ForEach(File.ReadLines(args[0]), line =>
{
    var parts = line.Split(' ');
    var duration = (DateTime.Parse(parts[1]) - DateTime.Parse(parts[0])).Ticks;

    var idAsStr = parts[2];
    var value = stats.GetOrAdd(idAsStr, s => new FastRecord
    {
        Id = long.Parse(s)
    });

    Interlocked.Add(ref value.DurationInTicks, duration);
});

And that one runs at 4.1 seconds, allocates 3 GB and has a peek working set of 48 MB.

We are now close to eight times faster than the initial version! ...but we can probably still do better. I’ll go over that in my next post.

Do you pay to use your database? What if your database paid you? Learn more with RavenDB.

Topics:
refactoring ,performance

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}