DZone
Performance Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Performance Zone > Making Code Faster: Going Down the I/O Chute

Making Code Faster: Going Down the I/O Chute

Ayende Rahien continues his series about making code faster in terms of input and output.

Oren Eini user avatar by
Oren Eini
·
Nov. 20, 16 · Performance Zone · Opinion
Like (1)
Save
Tweet
3.44K Views

Join the DZone community and get the full member experience.

Join For Free

After introducing the problem and doing some very obvious things, and then doing some pretty non-obvious things, we have managed to get to one-eighth of the initial time of the original implementation.

We can do better still. So far, we relied heavily on the File.ReadLines method, which handles quite a lot of the parsing complexity for us. However, that would still allocate a string per line, and our parsing relied on us splitting the strings again, meaning more allocations.

We can take advantage of our knowledge of the file to do better. The code size blows up, but it is mostly very simple. We create a dedicated record reader class that will read each line of the file with a minimum of allocations.

public class RecordReader : IDisposable
{
    public long Duration;
    public long Id;
    private readonly StreamReader _streamReader;

    private const int SizeOfDate = 19;// 2015-01-01T16:44:31
    private const int SizeOfSpace = 1;
    private const int SizeOfId = 8; // 00043064
    private const int SizeOfNewLine = 2; // \r\n
    private const int SizeOfRecord = SizeOfDate + SizeOfSpace + SizeOfDate + SizeOfSpace + SizeOfId + SizeOfNewLine;

    private readonly char[] _buffer = new char[SizeOfRecord];

    public RecordReader(string file)
    {
        _streamReader = new StreamReader(file);
    }

    public bool MoveNext()
    {
        int sizeRemaining = _buffer.Length;
        int index = 0;
        while (sizeRemaining > 0)
        {
            var read = _streamReader.ReadBlock(_buffer, index, sizeRemaining);
            if (read == 0)
                return false;
            index += read;
            sizeRemaining -= read;
        }

        Duration = (ParseTime(20) - ParseTime(0)).Ticks;
        Id = ParseInt(40, 8);

        return true;
    }

    private DateTime ParseTime(int pos)
    {
        var year = ParseInt(pos, 4);
        var month = ParseInt(pos + 5, 2);
        var day = ParseInt(pos + 8, 2);
        var hour = ParseInt(pos + 11, 2);
        var min = ParseInt(pos + 14, 2);
        var sec = ParseInt(pos + 17, 2);
        return new DateTime(year, month, day, hour, min, sec);
    }

    private int ParseInt(int pos, int size)
    {
        var val = 0;
        for (int i = pos; i < pos + size; i++)
        {
            val *= 10;
            val += _buffer[i] - '0';
        }
        return val;
    }

    public void Dispose()
    {
        _streamReader.Dispose();
    }
}

There is a nontrivial amount of stuff going on here. We start by noting that the size in character of the data is fixed, so we can compute the size of a record very easily. Each record is exactly 50 bytes long.

The key parts here is that we are allocating a single buffer variable, which will hold the line characters. Then we just wrote our own date and integer parsing routines that are very trivial, specific to our case and most importantly, don’t require us to allocate additional strings.

Using this code is done with:

  var stats = new Dictionary<long, FastRecord>();
  using (var reader = new RecordReader(args[0]))
  {
      while (reader.MoveNext())
      {
          FastRecord value;
          if (stats.TryGetValue(reader.Id, out value) == false)
          {
              stats[reader.Id] = value = new FastRecord
              {
                  Id = reader.Id
              };
          }
          value.DurationInTicks += reader.Duration;
      }
  }

So, we are back to single-threaded mode. Running this code gives us a runtime of 1.7 seconds, 126 MB allocated and a peak working set of 35 MB.

We are now about 2.5 times faster than previous parallel version, and over 17 times faster than the original version.

Making this code parallel is fairly trivial now; divide the file into sections and have a record reader on each section. However, is there really much point at this stage?

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Understand Source Code — Deep Into the Codebase, Locally and in Production
  • Applying Domain-Driven Design Principles to Microservice Architectures
  • Modern Application Security Requires Defense in Depth
  • Exhaustive JUNIT5 Testing with Combinations, Permutations, and Products

Comments

Performance Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo