DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report

Making Code Faster: Going Down the I/O Chute

Ayende Rahien continues his series about making code faster in terms of input and output.

Oren Eini user avatar by
Oren Eini
·
Nov. 20, 16 · Opinion
Like (1)
Save
Tweet
Share
3.59K Views

Join the DZone community and get the full member experience.

Join For Free

After introducing the problem and doing some very obvious things, and then doing some pretty non-obvious things, we have managed to get to one-eighth of the initial time of the original implementation.

We can do better still. So far, we relied heavily on the File.ReadLines method, which handles quite a lot of the parsing complexity for us. However, that would still allocate a string per line, and our parsing relied on us splitting the strings again, meaning more allocations.

We can take advantage of our knowledge of the file to do better. The code size blows up, but it is mostly very simple. We create a dedicated record reader class that will read each line of the file with a minimum of allocations.

public class RecordReader : IDisposable
{
    public long Duration;
    public long Id;
    private readonly StreamReader _streamReader;

    private const int SizeOfDate = 19;// 2015-01-01T16:44:31
    private const int SizeOfSpace = 1;
    private const int SizeOfId = 8; // 00043064
    private const int SizeOfNewLine = 2; // \r\n
    private const int SizeOfRecord = SizeOfDate + SizeOfSpace + SizeOfDate + SizeOfSpace + SizeOfId + SizeOfNewLine;

    private readonly char[] _buffer = new char[SizeOfRecord];

    public RecordReader(string file)
    {
        _streamReader = new StreamReader(file);
    }

    public bool MoveNext()
    {
        int sizeRemaining = _buffer.Length;
        int index = 0;
        while (sizeRemaining > 0)
        {
            var read = _streamReader.ReadBlock(_buffer, index, sizeRemaining);
            if (read == 0)
                return false;
            index += read;
            sizeRemaining -= read;
        }

        Duration = (ParseTime(20) - ParseTime(0)).Ticks;
        Id = ParseInt(40, 8);

        return true;
    }

    private DateTime ParseTime(int pos)
    {
        var year = ParseInt(pos, 4);
        var month = ParseInt(pos + 5, 2);
        var day = ParseInt(pos + 8, 2);
        var hour = ParseInt(pos + 11, 2);
        var min = ParseInt(pos + 14, 2);
        var sec = ParseInt(pos + 17, 2);
        return new DateTime(year, month, day, hour, min, sec);
    }

    private int ParseInt(int pos, int size)
    {
        var val = 0;
        for (int i = pos; i < pos + size; i++)
        {
            val *= 10;
            val += _buffer[i] - '0';
        }
        return val;
    }

    public void Dispose()
    {
        _streamReader.Dispose();
    }
}

There is a nontrivial amount of stuff going on here. We start by noting that the size in character of the data is fixed, so we can compute the size of a record very easily. Each record is exactly 50 bytes long.

The key parts here is that we are allocating a single buffer variable, which will hold the line characters. Then we just wrote our own date and integer parsing routines that are very trivial, specific to our case and most importantly, don’t require us to allocate additional strings.

Using this code is done with:

  var stats = new Dictionary<long, FastRecord>();
  using (var reader = new RecordReader(args[0]))
  {
      while (reader.MoveNext())
      {
          FastRecord value;
          if (stats.TryGetValue(reader.Id, out value) == false)
          {
              stats[reader.Id] = value = new FastRecord
              {
                  Id = reader.Id
              };
          }
          value.DurationInTicks += reader.Duration;
      }
  }

So, we are back to single-threaded mode. Running this code gives us a runtime of 1.7 seconds, 126 MB allocated and a peak working set of 35 MB.

We are now about 2.5 times faster than previous parallel version, and over 17 times faster than the original version.

Making this code parallel is fairly trivial now; divide the file into sections and have a record reader on each section. However, is there really much point at this stage?

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • When Should We Move to Microservices?
  • What Is API-First?
  • Create a CLI Chatbot With the ChatGPT API and Node.js
  • Container Security: Don't Let Your Guard Down

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: