Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Fast Transaction Log: Windows

DZone's Guide to

Fast Transaction Log: Windows

Ayende Rahien presents his analysis of journal writing techniques on Linux versus Windows, including the impacts on system performance.

· Performance Zone
Free Resource

In my previous post, I have tested journal writing techniques on Linux. In this post, I want to do the same for Windows, and see what the impact of the various options are on the system performance.

Windows has slightly different options than Linux. In particular, in Windows, the various flags and promises are very clear, and it is quite easy to figure out what is it that you are supposed to do.

We have tested the following scenarios:

  • Doing buffered writes (pretty useless for any journal file, which needs to be reliable, but good a baseline metric).
  • Doing buffered writes and calling FlushFileBuffers after each transaction (which is a pretty common way to handle committing to disk in databases), and the equivalent of calling fsync.
  • Using the FILE_FLAG_WRITE_THROUGH flag and asking the kernel to make sure that after every write, everything will be flushed to disk. Note that the disk may or may not buffer.
  • Using the FILE_FLAG_NO_BUFFERING flag to bypass the kernel’s caching and go directly to the disk. This has special memory alignment considerations.
  • Using the FILE_FLAG_WRITE_THROUGH | FILE_FLAG_NO_BUFFERING flag to ensure that we don’t do any caching and actually force the disk to do its work. On Windows, this is guaranteed to ask the disk to flush to the persisted medium (but the disk can ignore this request).

Here is the code:

#include "stdafx.h"
#include <Windows.h>



#define FILE_SIZE  (1024 * 1024 * 1024L)
#define CHUNK_SIZE (16 * 1024)

void simulate_journal_writes(HANDLE file, char* desc, int use_fsync)
{
LARGE_INTEGER freq;
int pos;
LARGE_INTEGER start, stop;
byte* buffer;
HCRYPTPROV hProvider;

if (file == INVALID_HANDLE_VALUE)
{
printf("Invalid file %i", GetLastError());
exit(1);
}

buffer = (byte*)VirtualAlloc(NULL, CHUNK_SIZE, MEM_COMMIT, PAGE_READWRITE);
if (buffer == NULL)
{
printf("Invalid memory allocation %i", GetLastError());
exit(1);
}

if (!CryptAcquireContext(&hProvider, NULL, NULL, PROV_RSA_FULL, CRYPT_VERIFYCONTEXT | CRYPT_SILENT))
{
printf("Invalid crypto context %i", GetLastError());
exit(1);
}

if (!CryptGenRandom(hProvider, CHUNK_SIZE, buffer))
{
printf("Invalid crypto context %i", GetLastError());
exit(1);
}
if (!CryptReleaseContext(hProvider, NULL)) 
{
printf("Can't release %i", GetLastError());
exit(1);
}

QueryPerformanceFrequency(&freq);

QueryPerformanceCounter(&start);

for (pos = 0; pos<FILE_SIZE; pos += CHUNK_SIZE)
{
if (!WriteFile(file, buffer, CHUNK_SIZE, NULL, NULL))
{
printf("\ncan't write %i", GetLastError());
exit(2);
}
if (use_fsync)
{
if (!FlushFileBuffers(file))
{
printf("\ncan't fsync %i", GetLastError());
exit(2);
}
}
}


QueryPerformanceCounter(&stop);

printf("\nRunning %s took %f ms\n", desc,
((double)(stop.QuadPart - start.QuadPart)*1000.0 / (double)freq.QuadPart)
);

}

int main()
{
LPCWSTR filename = L"journal.log";
HANDLE file;

DeleteFile(filename);

file = CreateFile(filename, GENERIC_ALL, FILE_SHARE_READ, NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);

if (file == INVALID_HANDLE_VALUE)
{
printf("Error creating file %i", GetLastError());
return 1;
}

if (SetFilePointer(file, 1024 * 1024 * 1024L, NULL, FILE_BEGIN) == INVALID_SET_FILE_POINTER)
{
printf("Error setting file pointer %i", GetLastError());
return 1;
}

if (!SetEndOfFile(file))
{
printf("Error setting end of file %i", GetLastError());
return 1;
}

if (!CloseHandle(file))
{
printf("Error closing %i", GetLastError());
return 1;
}

printf("\nStarting\n");

file = CreateFile(filename, GENERIC_ALL, FILE_SHARE_READ, NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
simulate_journal_writes(file, " buffered ", 0);
CloseHandle(file);

file = CreateFile(filename, GENERIC_ALL, FILE_SHARE_READ, NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
simulate_journal_writes(file, " buffered + fsync", 1);
CloseHandle(file);

file = CreateFile(filename, GENERIC_ALL, FILE_SHARE_READ, NULL, OPEN_ALWAYS, FILE_FLAG_WRITE_THROUGH, NULL);
simulate_journal_writes(file, " write_through ", 0);
CloseHandle(file);


file = CreateFile(filename, GENERIC_ALL, FILE_SHARE_READ, NULL, OPEN_ALWAYS, FILE_FLAG_NO_BUFFERING, NULL);
simulate_journal_writes(file, " no_buffering ", 0);
CloseHandle(file);


file = CreateFile(filename, GENERIC_ALL, FILE_SHARE_READ, NULL, OPEN_ALWAYS, FILE_FLAG_NO_BUFFERING | FILE_FLAG_WRITE_THROUGH, NULL);
simulate_journal_writes(file, " write_through | no_buffering ", 0);
CloseHandle(file);

    return 0;
}

We have tested this on an AWS machine (i2.2xlarge – 61 GB, eight cores, 2x 800 GB SSD drive, 1GB /sec EBS), which was running Microsoft Windows Server 2012 R2 RTM 64-bits. The code was compiled for 64 bits with the default release configuration.

What we are doing is writing a one GB journal file, simulating 16 KB transactions, and simulating 65,535 separate commits to the disk. That is a lot of work that needs to be done.

First, I ran it on the system drive, to see how it behaves:

Method Time (ms) Write cost (ms)
Buffered

396

0.006

Buffered + FlushFileBuffers

121,403

1.8

FILE_FLAG_WRITE_THROUGH

58,376

0.89

FILE_FLAG_NO_BUFFERING

56,162

0.85

FILE_FLAG_WRITE_THROUGH | FILE_FLAG_NO_BUFFERING

55,432

0.84

Remember, this is running on the system disk, not on the SSD drive. Here are those numbers, which are much more interesting for us.

Method Time (ms) Write cost (ms)
Buffered

410

0.006

Buffered + FlushFileBuffers

21,077

0.321

FILE_FLAG_WRITE_THROUGH

10,029

0.153

FILE_FLAG_NO_BUFFERING

8,491

0.129

FILE_FLAG_WRITE_THROUGH | FILE_FLAG_NO_BUFFERING

8,378

0.127

And those numbers are very significant. Unlike the system disk, where we basically get whatever spare cycles we have in both Linux and Windows, the SSD disk provides really good performance. But even on identical machine, running nearly identical code, there are significant performance differences between them:

Options

Windows

Linux

Difference

Buffered

0.006

0.03

80 Percent Win

Buffered + fsync() / FlushFileBuffers()

0.32

0.35

9 Percent Win

O_DSYNC / FILE_FLAG_WRITE_THROUGH

0.153

0.29

48 Percent Win

O_DIRECT / FILE_FLAG_NO_BUFFERING

0.129

0.14

8 Percent Win

O_DIRECT | O_DSYNC / FILE_FLAG_WRITE_THROUGH | FILE_FLAG_NO_BUFFERING

0.127

0.31

60 Percent Win

In pretty much all cases, Windows has been able to out perform Linux in this specific scenario, in many cases by a significant margin. In particular, in the scenario that I really care about, we see a 60 percent performance advantage to Windows.

One of the reasons for this blog post and the detailed code and scenario is the external verification of these numbers. I’ll love to know if I missed something that would make Linux speed comparable to Windows, because right now this is pretty miserable.

I do have a hunch about those numbers, though. SQL Server is a major business for Microsoft, so they have a lot of pull in the company. And SQL Server uses FILE_FLAG_WRITE_THROUGH | FILE_FLAG_NO_BUFFERING internally to handle the transaction log it uses. Like quite a bit of other Win32 APIs (WriteGather, for example), it looks tailor made for database journaling. I’m guessing that this code path has been gone over multiple times over the years, trying to optimize SQL Server by smoothing anything in the way.

As a result, if you know what you are doing, you can get some really impressive numbers on Windows in this scenario. Oh, and just to quiet the nitpickers:

image_thumb[5]

Topics:
performance ,c programming language ,metrics ,transaction log

Published at DZone with permission of Oren Eini, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}