Alongside all the exciting advents in Windows 8 and Metro apps, the .NET CLR is marching on. The next version of the CLR will feature several “internals” improvements, mostly in the performance area. Read on to learn about changes to the garbage collector, the JIT, and the native image generator engine in the next CLR.
Background mode for Server GC
Background GC is a neat feature introduced in CLR 4.0 to the Workstation GC flavor. It’s a little hard to explain without any background (pun intended), but the general gist is the following: while performing a full gen2 collection, the GC checkpoints at well-defined locations to see if a gen0/gen1 GC has been requested*. If a lower generation GC has been requested, the gen2 collection is paused, and the low-generation collection runs to completion before the gen2 collection resumes. This allows the code that caused the low-generation collection to resume its execution.
The new feature in the next CLR is that background GC is now supported on Server GC. (Recall that under Server GC, there is a GC heap for each processor and a separate GC thread for each processor.) There are additional changes in the GC, including a work-stealing mode in Server GC that allows GC threads to steal parts of the graph for marking if the workload is not distributed evenly.
Managed Profile Guided Optimization
If you have done any C++ work since Visual Studio 2003, you probably know about Profile Guided Optimization (PGO). In the C++ compiler, PGO is a special multi-step optimization mode, which relies on exercising the application through a set of scenarios with data collection enabled, and then recompiling the application with this data to optimize based on runtime information.
The same idea is now applied to managed code that has been NGEN-ed. A special command-line tool called mpgo.exe (which I can see integrated into Visual Studio like the C++ counterpart) instruments your .exe and embeds into it some runtime data that is subsequently used by ngen.exe to generate an optimized image.
You might wonder what kind of optimizations are enabled by MPGO. The only two that were disclosed at the time have to do with reducing the number of pages loaded from disk by organizing hot code together on the minimal number of pages, and with reducing the number of copy-on-write pages by organizing together on the minimal number of pages any data that will likely be overwritten at runtime.
Integrating NGEN in your application’s installer might be a challenge; auto-NGEN addresses this challenge by performing NGEN automatically if runtime-generated profiling information deems fit. When your .NET 4.5 application runs, the runtime generates assembly usage logs which are examined by an automatic maintenance task that runs in the background when the system is idle, and generates native images for frequently used assemblies. (This maintenance task might also choose to evict native images for assemblies that have not been used for a long time.)
Multi-core background JIT
Developers can now opt-in to an optimized JIT that uses otherwise unused processing time to perform multi-threaded JIT compilation of frequently used methods speculatively. The first time the application is launched, data is collected on frequently used methods, and on subsequent launches this data is used to guide the background JIT engine.
This feature requires adding a few lines of code to your application – calls to the ProfileOptimization.SetProfileRoot() and ProfileOptimization.StartProfile(filename) methods. However, this feature is enabled by default in Silverlight 5 and ASP.NET applications.
This last feature allows recompiling a method (running the JIT again) at runtime. This is a feature most likely suitable for profilers that replace the IL code with an instrumented version at runtime to collect information without restarting the process. It is exposed through the ICorProfileInfo4::RequestReJIT and ICorProfilerCallback4::GetReJITParameters methods.
* How come a low-generation GC was requested while a full collection was running? Recall that Workstation GC can be non-blocking (concurrent), enabling application threads to run during significant parts of the collection itself, including most of the GC mark phase.