I am often asked why memory-mapped files can be more efficient than plain read/write I/O calls, and whether shared memory is slower than private memory. These seemingly unrelated mechanisms share a common implementation in the Windows kernel, known as section objects or file mapping objects. Yes, this shared implementation powers memory pages that are shared across multiple processes (by name) as well as file regions mapped to memory pages (even in a single process).
If you’re interested in a thorough discussion of how section objects work, I must refer you to Windows Internals, 6th Edition. But if you’re only here for the quick answers and myth-busting, read on; I promise that in just over 650 words you’ll have what you came for.
On Memory-Mapped Files
When you map a file to memory, you instruct the Windows memory manager to create a data structure that maps a region of virtual memory pages in your process’ address space to a region of a file on disk. More specifically, the memory manager marks the virtual memory region as invalid from the processor’s perspective, but sets aside some book-keeping information that describes the mapping (in data structures known as section control areas, segments, and subsections).
Next, when your application accesses one of the virtual addresses mapped to the file, the processor generates an exception. The memory manager handles the exception by performing an on-demand read from the file to a newly allocated physical memory page, and remaps the virtual page to that physical page. As a result, your application can now access that page of virtual memory as usual, and it will have the contents of the file on disk.
Why is this any more efficient or useful than just calling a read API, such as the Win32 ReadFile or the .NET FileStream.Read? There are numerous reasons:
After a page has already been fetched from disk, subsequent accesses will not require any intervention on the OS’s behalf. You simply read and write memory. On the other hand, when you use file manipulation APIs, you incur a system call, which is several orders of magnitude more expensive.
Accesses to a memory mapped file will not require additional buffers to be allocated and freed by your program. The system manages the physical memory for the mapped file, and you don’t have to allocate a read or write buffer and copy data an extra time.
When working with other libraries designed to manipulate memory addresses directly, you can easily (and with no additional overhead) provide the base address for a memory-mapped file to the library functions. For example, to copy data from one region of the file to another, you can simply use memcpy instead of a loop that performs read and write operations and manipulates seek pointers. Similarly, the Windows image loader simply maps DLLs into memory so that when a function in your DLL invokes another function which isn’t resident, it is transparently loaded from disk as necessary. (This is a simplified picture that ignores relocations, but you get the idea.)
This is not to say that memory-mapped files are without disadvantages. For example, asynchronous I/O doesn’t play well with memory-mapped files (when you read or write a memory location, you don’t have the luxury of specifying a continuation or doing anything else while the page is fetched). But there are a great many cases when memory-mapped files can be extremely effective, and these are often overlooked by Windows developers.
On Shared Memory
Section objects are also used to share memory between processes. When one process names a section object, another process may open a handle to it by using the same name (given the appropriate access rights). Both processes can then map regions of that section object to their respective virtual address spaces. After both processes have done so, each process will have a region of virtual memory pages mapped to the shared region of physical memory pages.
Note that the section object might be based on a file on disk, in which case you have a memory-mapped file shared across processes: this is what the OS image loader does with DLLs. Or, the section object may not be based on a file on disk, in which case it is purely a shared memory region backed by the page file, if present.
Neither your application nor the CPU has to care whether the memory pages are shared with another process or not. In fact, from the CPU’s perspective, accessing a memory page that just happens to be mapped in some other process is exactly the same as accessing a private page. The end result is that shared memory is just as fast as private memory; the hardware simply doesn’t care if it is or isn’t shared.