Profiling an application with Visual Studio – Concurrency

DZone 's Guide to

Profiling an application with Visual Studio – Concurrency

· ·
Free Resource

This is the last article in the profiling series, but not the least important. With the massive growth of multicore machines and the development of multi-threaded applications, an important need appeared - the need to trace the performance in order to create an effective multi-threaded environment, where threads are able to interact with the least impact on the machine performance. Visual Studio offers concurrency profiling tools, that allow developers to analyze data connected to thread interactions and activity.

This time I am going to use the same sample application I’ve used in the first article, so that there will be a simulation of a multi-threaded environment with the same method being called multiple times from various threads.This will generate a set of results that can be used to realistically see the impact of a threaded application on the target computer performance, as well visuzlize the application performance indicators by themselves.

Starting the process

Once you start the Performance Wizard (via the Analyze menu), select the Concurrency method. To make the data set as descriptive as possible, I selected both options for collecting resource contention data and visualization of the behavior of a multithreaded application, since there are multiple experimental threads involved in my sample application. This will give us a detailed set of indicators regarding the application activity, CPU and thread-wise.

IMPORTANT NOTE: Profiling your application using the Concurrency method requires elevated privileges for Visual Studio. If the profiling session is launched without elevated privileges, the following message will appear:

I am using the default profiling settings, so I am going to leave the selections unchanged during the wizard setup. Click Next till you reach the final dialog. Make sure that the session will be launched once the wizard exits and click Finish.

IMPORTANT NOTE: If you are running the session on an x64 machine, you might encounter this message if you have executive paging enabled.

You can disable it by clicking on Yes (a restart will be required), however this is not a requirement for a sample session I am currently executing.

Also, if you’ve had debugging symbols disabled, you might want to enable them by allowing debug symbols to be retrieved from Microsoft Symbol Servers:

On a side note, if you aren’t aware of what debug symbols are, those are code indicators that basically allow tracking errors by allowing the debugger to read the code structures that generated the errors. This means that some variable or method names are exposed to the developer (or user) who attached the debugger to the process in test, in order for the person who debugs the code to directly see what the source of the error is.

Analyzing the results

Once the profiling session is complete, you will be presented with a completely different set of data compared to other methods. First of all, you will see that there are three additional views available: CPU Utilization, Threads and Cores.

Before going into detailed review of what each of these represents, there is still some data available on the main report page (the Summary view). There is Most Contended Resources that shows resources with the most contentions.

CONTENTIONS: Occur when a thread tries to acquire a lock held by another thread or resource (in this case, lock contentions). The less contentions there are in a process, the better.

Note that the number of contentions here is quite high and this isn’t the best example. To avoid this, I could introduce a lock inside my testing method, so that the final indicators would look more like this:

This is a much better-looking graph, since the number of contentions is reduced to a minimum. If you click on a handle, you are able to visually see the contentions separated into threads:

Horizontal lines represent threads (recognized by their name or ID) and each block on a line is a contention. If you click on a thread, you are able to view the contentions only for that specific unit, as well as see the call stack for a contention – this might help you in determining where the possible source might be and where a lock should be in place.

The Most Contended Threads view allows you to analyze the same data as the one shown above, only in the context of separate threads instead of resources.

Since I am creating threads via ThreadPool, the thread names are automatically generated. This, however, can be changed if I would create the threads for each method call manually (via Thread instances). So, to show an example, here is how I modified the thread creation process:

Program p = new Program();

string[] states = { "TX", "KS", "CA", "VA", "WA" };

foreach (string state in states)
Thread thread = new Thread(new ParameterizedThreadStart(p.Get));
thread.Name = state + " Thread";

Here is what the profiling view shows now:

Given that the threads are named, it is much easier to see which ones are contended the most. If you click on a thread, you see the contentions spread by resources (instead of threads, as in the previous view):

The optimal usage of these views would be for determining the sections in code that require optimization.

Also some interesting statistics are provided by the CPU Utilization view:

Here I can see the application’s impact on the CPU, also being able to track the distribution of the load on multiple cores (in case those are present). Notice that not only the application performance is graphed here, but also the client system performance. In this case, it is easy to compare the application performance to the actual system activity that impacts the CPU load.

Directly connected to this statistical indicator is the Cores view.

This view displays the thread activity distribution among multiple logical cores (if those are present on the analyzed system). Although the number of cross-core context switches is pretty high, this is an expected indicator since there is a multitude of threads. Compared to an application without additional threads, you can see that the distribution is a bit different:

Context switches generally have a negative impact on the application performance - the more switches, the more time it is allocated to actually perform the switches than real work. Therefore if there is very high number of context switches present, you might want to re-design the threaded architecture for your application.

Last but not least, there is the Threads view that allows reviewing the thread performance and connected interactions.

Each row here is displaying the thread activity. Various colors represent the type of thread activity, as set by the blocking call, that is also outlined in the Visible Timeline Profile box, where you can see the percentage for each operational type. For my application, most of the time is spent by threads for blocked in synchronization and memory management states. Please note that this not only applies to application-generated threads, but also to threads managed directly by the CLR.

If you view the per thread summary, you will see that various threads have different operational sets:

From this graph, it is clear that thread 5148 is blocked in a state that is considered as managing memory (for example, Paging) and is spending quite a lot of its functional time in that state, while other threads are mostly blocked on synchronization. Once you click on an operational type, you are able to review the blocking profile that reveals the calls that blocked the thread for the specific operation type.

The current stack gives even more details about the given thread state:

I can clearly see that it was the WaitForMultipleObjects API function that set the thread as in synchronization state.

The unblocking stack, on the other side displays the opposite – the thread that unblocked the selected thread from the declared state and the associated calls.

In the same Threads view developers are able to keep track of al file operations managed by the application. This includes not operations defined in the code, but also implied operations that are triggered by the CLR, like access to clr.dll or kernel32.dll.


Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}