DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
  1. DZone
  2. Coding
  3. Languages
  4. I Like My Performance Unsafe

I Like My Performance Unsafe

Sometimes, you get to the point where you need to go native and use a bit of unsafe code. In this article, Ayende Rahien goes through the process of doing that.

Oren Eini user avatar by
Oren Eini
·
Nov. 25, 16 · Opinion
Like (5)
Save
Tweet
Share
4.84K Views

Join the DZone community and get the full member experience.

Join For Free

After introducing the problem and doing some very obvious things (then doing some pretty non-obvious things and even writing our own I/O routines) we ended up with an implementation that is 17 times faster than the original one.

Image title

And yet, we can still do better. At this point, we need to go native and use a bit of unsafe code. We’ll start by implementing a naïve native record parser, like so:

public static unsafe class NativeRecord
{
    public static void Parse(byte* buffer, out long id, out long duration)
    {
        duration = (ParseTime(buffer + 20) - ParseTime(buffer)).Ticks;
        id = ParseInt(buffer + 40, 8);
    }

    private static DateTime ParseTime(byte* buffer)
    {
        var year = ParseInt(buffer, 4);
        var month = ParseInt(buffer + 5, 2);
        var day = ParseInt(buffer + 8, 2);
        var hour = ParseInt(buffer + 11, 2);
        var min = ParseInt(buffer + 14, 2);
        var sec = ParseInt(buffer + 17, 2);
        return new DateTime(year, month, day, hour, min, sec);
    }

    private static int ParseInt(byte* buffer, int size)
    {
        var val = 0;
        for (int i = 0; i < size; i++)
        {
            val *= 10;
            val += buffer[i] - '0';
        }
        return val;
    }
}

This is pretty much the same as before, but now we are dealing with pointers. How do we use this?

var stats = new Dictionary<long, FastRecord>();
using (var mmf = MemoryMappedFile.CreateFromFile(args[0]))
using (var accessor = mmf.CreateViewAccessor())
{
    byte* buffer = null;
    accessor.SafeMemoryMappedViewHandle.AcquirePointer(ref buffer);

    var end = buffer + new FileInfo(args[0]).Length;
    while (buffer != end)
    {
        long id;
        long duration;
        NativeRecord.Parse(buffer, out id, out duration);
        buffer += 50;
        FastRecord value;
        if (stats.TryGetValue(id, out value) == false)
        {
            stats[id] = value = new FastRecord
            {
                Id = id
            };
        }
        value.DurationInTicks += duration;
    }
}
view raw

We memory map the file, and then we go over it, doing no allocations at all throughout.

This gives us one second to process the file, 126 MB allocated (probably in the dictionary) and a peak working set of 320 MB.

We are now 30 times faster than the initial implementation, and I wonder if I can do more. We can do that by going parallel, which gives us the following code:

// state

public unsafe class ThreadState
{
    public Dictionary<long, FastRecord> Records;
    public byte* Start;
    public byte* End;
}


// parallel work

Dictionary<long, FastRecord> allStats;

using (var mmf = MemoryMappedFile.CreateFromFile(args[0]))
using (var accessor = mmf.CreateViewAccessor())
{
    byte* buffer = null;
    accessor.SafeMemoryMappedViewHandle.AcquirePointer(ref buffer);
    var len = new FileInfo(args[0]).Length;
    var entries = len / 50;

    int count = 4;
    var threadStates = new ThreadState[count];
    for (int i = 0; i < count; i++)
    {
        threadStates[i] = new ThreadState
        {
            Records = new Dictionary<long, FastRecord>(),
            Start = buffer + i * (entries / count) * 50,
            End = buffer + (i+1) * (entries / count) * 50
        };
    }

    threadStates[threadStates.Length - 1].End = buffer + len;

    Parallel.ForEach(threadStates, state=> 
    {
        while (state.Start != state.End)
        {
            long id;
            long duration;
            NativeRecord.Parse(state.Start, out id, out duration);
            state.Start += 50;
            FastRecord value;
            if (state.Records.TryGetValue(id, out value) == false)
            {
                state.Records[id] = value = new FastRecord
                {
                    Id = id
                };
            }
            value.DurationInTicks += duration;
        }
    });
    allStats = threadStates[0].Records;
    for (int i = 1; i < count; i++)
    {
        foreach (var record in threadStates[i].Records)
        {
            FastRecord value;
            if (allStats.TryGetValue(record.Key, out value))
                value.DurationInTicks += record.Value.DurationInTicks;
            else
                allStats.Add(record.Key, record.Value);
        }
    }
}

This is pretty ugly, but basically, we are using four threads to run it and we are giving each one of them a range of the file as well as their own dedicated records dictionary. After we are done, we need to merge the records to a single dictionary, and that is it.

Using this approach, we can get down to 663 ms run time, 184 MB of allocations and 364 MB peak working set.

So, we are now about 45(!) times faster than the original version. We are almost done, but on my next post, I’m going to go ahead and pull the profiler and see if we can squeeze anything else out of it.

Memory map Dictionary (software) Record (computer science) Implementation Pointer (computer programming) Memory (storage engine) Parser (programming language) POST (HTTP)

Published at DZone with permission of Oren Eini, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Create Spider Chart With ReactJS
  • DevOps for Developers: Continuous Integration, GitHub Actions, and Sonar Cloud
  • Custom Validators in Quarkus
  • How To Choose the Right Streaming Database

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: