DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
View Events Video Library
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Integrating PostgreSQL Databases with ANF: Join this workshop to learn how to create a PostgreSQL server using Instaclustr’s managed service

Mobile Database Essentials: Assess data needs, storage requirements, and more when leveraging databases for cloud and edge applications.

Monitoring and Observability for LLMs: Datadog and Google Cloud discuss how to achieve optimal AI model performance.

Automated Testing: The latest on architecture, TDD, and the benefits of AI and low-code tools.

Related

  • What Are Protocol Buffers?
  • How Milvus Implements Dynamic Data Update and Query
  • Linked List in Data Structures and Algorithms
  • Leveraging Observability Techniques for Energy Efficiency Optimization in Data Centers

Trending

  • REST vs. Message Brokers: Choosing the Right Communication
  • Auditing Spring Boot Using JPA, Hibernate, and Spring Data JPA
  • Getting Started With Prometheus Workshop: Instrumenting Applications
  • AI for Web Devs: Project Introduction and Setup
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. With Malice Aforethought, We Can Try Even Better

With Malice Aforethought, We Can Try Even Better

Oren Eini user avatar by
Oren Eini
·
Sep. 11, 13 · Interview
Like (0)
Save
Tweet
Share
3.52K Views

Join the DZone community and get the full member experience.

Join For Free

Continuing on with the same theme from our last post, how can we improve the speed in which we write to disk? In particular, I am currently focused on my worst-case scenario:

    fill rnd buff 10,000 tx            :    161,812 ms      6,180 ops / sec

This is 10,000 transactions all running one after another, and taking really way too long to go about doing their thing. Now, we did some improvements and we got it all the way to 6,340 ops per sec, but I think you’ll agree that even this optimization is probably still bad. We spent more time there, trying to figure out exactly how we can do micro optimizations, and we got all the way up to 8,078 ops per sec.

That is the point where I decided that I would really like to look at the raw numbers that I can get from this system. So I wrote the following code:

 var key = Guid.NewGuid().ToByteArray();
 var buffer = new byte[100];
 new Random().NextBytes(buffer);
  
 using (var fs = new FileStream("test.bin", FileMode.Truncate, FileAccess.ReadWrite))
 {
     fs.SetLength(1024*1024*768);
  
     var sp = Stopwatch.StartNew();
  
     for (int i = 0; i < 10*1000; i++)
    {
         for (int j = 0; j < 100; j++)
         {
             fs.Write(key,0, 16);
             fs.Write(buffer, 0, 100);
         }
         fs.Flush(true);
     }

    Console.WriteLine("{0:#,#} ms for {1:#,#} ops / sec", sp.ElapsedMilliseconds, (1000*1000)/sp.Elapsed.TotalSeconds);
 }

This code mimics the absolute best scenario we could hope for. Zero cost for managing the data, pure sequential writes. Note that we call to Flush(true) to simulate 10,000 transactions. This code gives me: 147,201 ops per sec.

This is interesting, mostly because I thought the reason our random writes with 10,000 transactions were bad was the calls to Flush(), but it appears that this is actually working very well. I then tested this with some random writes, by adding the following lines before line 13:

var next = random.Next(0, 1024*1024*512);
 fs.Position = next - next%4096;

I then decided to try it with memory mapped files, and I wrote:

using (var fs = new FileStream("test.bin", FileMode.Truncate, FileAccess.ReadWrite))
 {
     fs.SetLength(1024 * 1024 * 768);
  
     var memoryMappedFile = MemoryMappedFile.CreateFromFile(fs,
                                     "test", fs.Length, MemoryMappedFileAccess.ReadWrite,
                                     null, HandleInheritability.None, true);
     var memoryMappedViewAccessor = memoryMappedFile.CreateViewAccessor();
  
     byte* p = null;
     memoryMappedViewAccessor.SafeMemoryMappedViewHandle.AcquirePointer(ref p);
  
     var sp = Stopwatch.StartNew();
  
     for (int i = 0; i < 10 * 1000; i++)
     {
         var next = random.Next(0, 1024 * 1024 * 512);
         byte* basePtr = p + next;
         using (var ums = new UnmanagedMemoryStream(basePtr, 12 * 1024,12*1024, FileAccess.ReadWrite))
         {
             for (int j = 0; j < 100; j++)
             {
                 ums.Write(key, 0, 16);
                 ums.Write(buffer, 0, 100);
             }
         }
     }
     Console.WriteLine("{0:#,#} ms for {1:#,#} ops / sec", sp.ElapsedMilliseconds, (1000 * 1000) / sp.Elapsed.TotalSeconds);
 }

You’ll note that I am not doing any flushing here. That is intention for now, using this, I am getting 5 million+ ops per second. But since I am not doing flushing, this is pretty much me testing how fast I can write to memory.

Adding a single flush cost us 1.8 seconds for a 768 MB file. And what about doing the right thing? Adding the following in line 26 means that we are actually flushing the buffers.

FlushViewOfFile(basePtr, new IntPtr(12 * 1024));

Note that we are not flushing to disk, we still need to do that. But for now, let's try doing it. This single line changed the code from 5 million+ ops to doing 170,988 ops per sec. And that does NOT include actual flushing to disk. When we do that, too, we get a truly ridiculous number: 20,547 ops per sec. And that explains quite a lot, I think.

For reference, here is the full code:

 unsafe class Program

   {
     [DllImport("kernel32.dll", SetLastError = true)]
     [return: MarshalAs(UnmanagedType.Bool)]
     extern static bool FlushViewOfFile(byte* lpBaseAddress, IntPtr dwNumberOfBytesToFlush);
  
     static void Main(string[] args)
     {
         var key = Guid.NewGuid().ToByteArray();
         var buffer = new byte[100];
         var random = new Random();
         random.NextBytes(buffer);
  
         using (var fs = new FileStream("test.bin", FileMode.Truncate, FileAccess.ReadWrite))
         {
             fs.SetLength(1024 * 1024 * 768);
  
             var memoryMappedFile = MemoryMappedFile.CreateFromFile(fs,
                                             "test", fs.Length, MemoryMappedFileAccess.ReadWrite,
                                             null, HandleInheritability.None, true);
             var memoryMappedViewAccessor = memoryMappedFile.CreateViewAccessor();
  
             byte* p = null;
             memoryMappedViewAccessor.SafeMemoryMappedViewHandle.AcquirePointer(ref p);
  
             var sp = Stopwatch.StartNew();
  
             for (int i = 0; i < 10 * 1000; i++)
             {
                 var next = random.Next(0, 1024 * 1024 * 512);
                 byte* basePtr = p + next;
                 using (var ums = new UnmanagedMemoryStream(basePtr, 12 * 1024, 12 * 1024, FileAccess.ReadWrite))
                 {
                    for (int j = 0; j < 100; j++)
                     {
                        ums.Write(key, 0, 16);
                         ums.Write(buffer, 0, 100);
                     }
                 }
                 FlushViewOfFile(basePtr, new IntPtr(12 * 1024));
                 fs.Flush(true);
             }
             Console.WriteLine("{0:#,#} ms for {1:#,#} ops / sec", sp.ElapsedMilliseconds, (1000 * 1000) / sp.Elapsed.TotalSeconds);
         }
     }
 }

This is about as efficient a way you can get for writing to the disk using memory mapped files if you need to do that using memory mapped files in a transactional manner. And that is the absolute best case scenario, pretty much. Where we know exactly what we wrote and where we wrote it, and we always write a single entry, of a fixed size, etc. In Voron’s case, we might write to multiple pages at the same transaction (in fact, we are pretty much guaranteed to do just that).

This means that I need to think about other ways of doing that.



Memory (storage engine) optimization Data (computing) POST (HTTP) Mimics Buffer (application)

Published at DZone with permission of Oren Eini, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • What Are Protocol Buffers?
  • How Milvus Implements Dynamic Data Update and Query
  • Linked List in Data Structures and Algorithms
  • Leveraging Observability Techniques for Energy Efficiency Optimization in Data Centers

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: