Over a million developers have joined DZone.

I Like My Performance Unsafe

Sometimes, you get to the point where you need to go native and use a bit of unsafe code. In this article, Ayende Rahien goes through the process of doing that.

· Performance Zone

Discover 50 of the latest mobile performance statistics with the Ultimate Guide to Digital Experience Monitoring, brought to you in partnership with Catchpoint.

After introducing the problem and doing some very obvious things (then doing some pretty non-obvious things and even writing our own I/O routines) we ended up with an implementation that is 17 times faster than the original one.

Image title

And yet, we can still do better. At this point, we need to go native and use a bit of unsafe code. We’ll start by implementing a naïve native record parser, like so:

public static unsafe class NativeRecord
{
    public static void Parse(byte* buffer, out long id, out long duration)
    {
        duration = (ParseTime(buffer + 20) - ParseTime(buffer)).Ticks;
        id = ParseInt(buffer + 40, 8);
    }

    private static DateTime ParseTime(byte* buffer)
    {
        var year = ParseInt(buffer, 4);
        var month = ParseInt(buffer + 5, 2);
        var day = ParseInt(buffer + 8, 2);
        var hour = ParseInt(buffer + 11, 2);
        var min = ParseInt(buffer + 14, 2);
        var sec = ParseInt(buffer + 17, 2);
        return new DateTime(year, month, day, hour, min, sec);
    }

    private static int ParseInt(byte* buffer, int size)
    {
        var val = 0;
        for (int i = 0; i < size; i++)
        {
            val *= 10;
            val += buffer[i] - '0';
        }
        return val;
    }
}

This is pretty much the same as before, but now we are dealing with pointers. How do we use this?

var stats = new Dictionary<long, FastRecord>();
using (var mmf = MemoryMappedFile.CreateFromFile(args[0]))
using (var accessor = mmf.CreateViewAccessor())
{
    byte* buffer = null;
    accessor.SafeMemoryMappedViewHandle.AcquirePointer(ref buffer);

    var end = buffer + new FileInfo(args[0]).Length;
    while (buffer != end)
    {
        long id;
        long duration;
        NativeRecord.Parse(buffer, out id, out duration);
        buffer += 50;
        FastRecord value;
        if (stats.TryGetValue(id, out value) == false)
        {
            stats[id] = value = new FastRecord
            {
                Id = id
            };
        }
        value.DurationInTicks += duration;
    }
}
view raw

We memory map the file, and then we go over it, doing no allocations at all throughout.

This gives us one second to process the file, 126 MB allocated (probably in the dictionary) and a peak working set of 320 MB.

We are now 30 times faster than the initial implementation, and I wonder if I can do more. We can do that by going parallel, which gives us the following code:

// state

public unsafe class ThreadState
{
    public Dictionary<long, FastRecord> Records;
    public byte* Start;
    public byte* End;
}


// parallel work

Dictionary<long, FastRecord> allStats;

using (var mmf = MemoryMappedFile.CreateFromFile(args[0]))
using (var accessor = mmf.CreateViewAccessor())
{
    byte* buffer = null;
    accessor.SafeMemoryMappedViewHandle.AcquirePointer(ref buffer);
    var len = new FileInfo(args[0]).Length;
    var entries = len / 50;

    int count = 4;
    var threadStates = new ThreadState[count];
    for (int i = 0; i < count; i++)
    {
        threadStates[i] = new ThreadState
        {
            Records = new Dictionary<long, FastRecord>(),
            Start = buffer + i * (entries / count) * 50,
            End = buffer + (i+1) * (entries / count) * 50
        };
    }

    threadStates[threadStates.Length - 1].End = buffer + len;

    Parallel.ForEach(threadStates, state=> 
    {
        while (state.Start != state.End)
        {
            long id;
            long duration;
            NativeRecord.Parse(state.Start, out id, out duration);
            state.Start += 50;
            FastRecord value;
            if (state.Records.TryGetValue(id, out value) == false)
            {
                state.Records[id] = value = new FastRecord
                {
                    Id = id
                };
            }
            value.DurationInTicks += duration;
        }
    });
    allStats = threadStates[0].Records;
    for (int i = 1; i < count; i++)
    {
        foreach (var record in threadStates[i].Records)
        {
            FastRecord value;
            if (allStats.TryGetValue(record.Key, out value))
                value.DurationInTicks += record.Value.DurationInTicks;
            else
                allStats.Add(record.Key, record.Value);
        }
    }
}

This is pretty ugly, but basically, we are using four threads to run it and we are giving each one of them a range of the file as well as their own dedicated records dictionary. After we are done, we need to merge the records to a single dictionary, and that is it.

Using this approach, we can get down to 663 ms run time, 184 MB of allocations and 364 MB peak working set.

So, we are now about 45(!) times faster than the original version. We are almost done, but on my next post, I’m going to go ahead and pull the profiler and see if we can squeeze anything else out of it.

Is your APM strategy broken? This ebook explores the latest in Gartner research to help you learn how to close the end-user experience gap in APM, brought to you in partnership with Catchpoint.

Topics:
performance ,unsafe code ,native development

Published at DZone with permission of Ayende Rahien, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}