DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • Power BI Embedded Analytics — Part 3: Power BI Embedded Demo
  • DGS GraphQL and Spring Boot
  • Auto-Instrumentation in Azure Application Insights With AKS
  • Deploying a Scala Play Application to Heroku: A Step-by-Step Guide

Trending

  • Infrastructure as Code (IaC) Beyond the Basics
  • After 9 Years, Microsoft Fulfills This Windows Feature Request
  • Memory Leak Due to Time-Taking finalize() Method
  • How Large Tech Companies Architect Resilient Systems for Millions of Users

Profiling an application with Visual Studio – Concurrency

By 
Denzel D. user avatar
Denzel D.
·
Jul. 28, 10 · Interview
Likes (0)
Comment
Save
Tweet
Share
13.4K Views

Join the DZone community and get the full member experience.

Join For Free

This is the last article in the profiling series, but not the least important. With the massive growth of multicore machines and the development of multi-threaded applications, an important need appeared - the need to trace the performance in order to create an effective multi-threaded environment, where threads are able to interact with the least impact on the machine performance. Visual Studio offers concurrency profiling tools, that allow developers to analyze data connected to thread interactions and activity.

This time I am going to use the same sample application I’ve used in the first article, so that there will be a simulation of a multi-threaded environment with the same method being called multiple times from various threads.This will generate a set of results that can be used to realistically see the impact of a threaded application on the target computer performance, as well visuzlize the application performance indicators by themselves.

Starting the process

Once you start the Performance Wizard (via the Analyze menu), select the Concurrency method. To make the data set as descriptive as possible, I selected both options for collecting resource contention data and visualization of the behavior of a multithreaded application, since there are multiple experimental threads involved in my sample application. This will give us a detailed set of indicators regarding the application activity, CPU and thread-wise.

IMPORTANT NOTE: Profiling your application using the Concurrency method requires elevated privileges for Visual Studio. If the profiling session is launched without elevated privileges, the following message will appear:

I am using the default profiling settings, so I am going to leave the selections unchanged during the wizard setup. Click Next till you reach the final dialog. Make sure that the session will be launched once the wizard exits and click Finish.

IMPORTANT NOTE: If you are running the session on an x64 machine, you might encounter this message if you have executive paging enabled.



You can disable it by clicking on Yes (a restart will be required), however this is not a requirement for a sample session I am currently executing.

Also, if you’ve had debugging symbols disabled, you might want to enable them by allowing debug symbols to be retrieved from Microsoft Symbol Servers:



On a side note, if you aren’t aware of what debug symbols are, those are code indicators that basically allow tracking errors by allowing the debugger to read the code structures that generated the errors. This means that some variable or method names are exposed to the developer (or user) who attached the debugger to the process in test, in order for the person who debugs the code to directly see what the source of the error is.

Analyzing the results

Once the profiling session is complete, you will be presented with a completely different set of data compared to other methods. First of all, you will see that there are three additional views available: CPU Utilization, Threads and Cores.

Before going into detailed review of what each of these represents, there is still some data available on the main report page (the Summary view). There is Most Contended Resources that shows resources with the most contentions.

CONTENTIONS: Occur when a thread tries to acquire a lock held by another thread or resource (in this case, lock contentions). The less contentions there are in a process, the better.

Note that the number of contentions here is quite high and this isn’t the best example. To avoid this, I could introduce a lock inside my testing method, so that the final indicators would look more like this:

This is a much better-looking graph, since the number of contentions is reduced to a minimum. If you click on a handle, you are able to visually see the contentions separated into threads:

Horizontal lines represent threads (recognized by their name or ID) and each block on a line is a contention. If you click on a thread, you are able to view the contentions only for that specific unit, as well as see the call stack for a contention – this might help you in determining where the possible source might be and where a lock should be in place.

The Most Contended Threads view allows you to analyze the same data as the one shown above, only in the context of separate threads instead of resources.

Since I am creating threads via ThreadPool, the thread names are automatically generated. This, however, can be changed if I would create the threads for each method call manually (via Thread instances). So, to show an example, here is how I modified the thread creation process:

Program p = new Program();

string[] states = { "TX", "KS", "CA", "VA", "WA" };

foreach (string state in states)
{
Thread thread = new Thread(new ParameterizedThreadStart(p.Get));
thread.Name = state + " Thread";
thread.Start(state);
}

Here is what the profiling view shows now:

Given that the threads are named, it is much easier to see which ones are contended the most. If you click on a thread, you see the contentions spread by resources (instead of threads, as in the previous view):

The optimal usage of these views would be for determining the sections in code that require optimization.

Also some interesting statistics are provided by the CPU Utilization view:

Here I can see the application’s impact on the CPU, also being able to track the distribution of the load on multiple cores (in case those are present). Notice that not only the application performance is graphed here, but also the client system performance. In this case, it is easy to compare the application performance to the actual system activity that impacts the CPU load.

Directly connected to this statistical indicator is the Cores view.

This view displays the thread activity distribution among multiple logical cores (if those are present on the analyzed system). Although the number of cross-core context switches is pretty high, this is an expected indicator since there is a multitude of threads. Compared to an application without additional threads, you can see that the distribution is a bit different:

Context switches generally have a negative impact on the application performance - the more switches, the more time it is allocated to actually perform the switches than real work. Therefore if there is very high number of context switches present, you might want to re-design the threaded architecture for your application.

Last but not least, there is the Threads view that allows reviewing the thread performance and connected interactions.

Each row here is displaying the thread activity. Various colors represent the type of thread activity, as set by the blocking call, that is also outlined in the Visible Timeline Profile box, where you can see the percentage for each operational type. For my application, most of the time is spent by threads for blocked in synchronization and memory management states. Please note that this not only applies to application-generated threads, but also to threads managed directly by the CLR.

If you view the per thread summary, you will see that various threads have different operational sets:

From this graph, it is clear that thread 5148 is blocked in a state that is considered as managing memory (for example, Paging) and is spending quite a lot of its functional time in that state, while other threads are mostly blocked on synchronization. Once you click on an operational type, you are able to review the blocking profile that reveals the calls that blocked the thread for the specific operation type.

The current stack gives even more details about the given thread state:

I can clearly see that it was the WaitForMultipleObjects API function that set the thread as in synchronization state.

The unblocking stack, on the other side displays the opposite – the thread that unblocked the selected thread from the declared state and the associated calls.

In the same Threads view developers are able to keep track of al file operations managed by the application. This includes not operations defined in the code, but also implied operations that are triggered by the CLR, like access to clr.dll or kernel32.dll.

application

Opinions expressed by DZone contributors are their own.

Related

  • Power BI Embedded Analytics — Part 3: Power BI Embedded Demo
  • DGS GraphQL and Spring Boot
  • Auto-Instrumentation in Azure Application Insights With AKS
  • Deploying a Scala Play Application to Heroku: A Step-by-Step Guide

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!