DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone >

Visual Studio 2012 C++ Auto-Parallelizer

Sasha Goldshtein user avatar by
Sasha Goldshtein
·
Jun. 19, 12 · · Interview
Like (0)
Save
Tweet
8.77K Views

Join the DZone community and get the full member experience.

Join For Free

as you might have gathered from some scarce reports on the web and the initial list of new features in visual studio 2012, the new c++ compiler is now capable of automatically vectorizing loop bodies—a feature i’ve already covered here, and also automatically parallelizing them using multiple threads.

here’s an example. consider the classic prime number calculation loop, designed to count the number of primes in a given range:

__declspec(noinline) bool is_prime(int n) {
    for (int x = 2; x < n; ++x) {
        if (n % x == 0 && n != x) return false;
    }
    return true;
}

long count = 0;
for (int i = 3; i < n; ++i) {
  if (is_prime(i)) {
    ++count;
  }
}
printf(“count = %d"\n”, count);

this is a classic, ripe candidate for parallelization—although we need to be a little careful with the shared count variable. with n=100000 the loop completes in ~1600ms on my desktop; perhaps the compiler can make it faster automatically.

we go ahead and enable the /qpar switch in the project properties. this allows the c++ compiler to perform automatic parallelization, but it still sometimes requires an explicit hint regarding the loops that might benefit from parallelization.

image

this hint is given in the form of a #pragma, indicating also how many threads you recommend that the runtime should use:

#pragma loop(hint_parallel(4))
for (int i = 3; i < n; ++i) {
  if (is_prime(i)) {
    ++count;
  }
}

this still takes ~1600ms on my machine, and no parallelization is visible. what’s wrong? the shared variable, of course. the compiler notices that it would be unsafe to parallelize the loop body and refrains from doing it. changing the loop to…

#pragma loop(hint_parallel(4))
for (int i = 3; i < n; ++i) {
  if (is_prime(i)) {
    interlockedincrement(&count);
  }
}

…suddenly works, and brings down the time to ~450ms. here are the four threads and a representative call stack, showing that the underlying engine is the same as in openmp (with its #pragma omp directives introduced in visual studio 2005!):

>debug.listcallstack
index  function
--------------------------------------------------------------------------------
*1      parallelizingcompilercpp.exe!wmain$par$1()
2      vcomp110.dll!_vcomp::c2vectparallelregion::serialcallback(_vcomp::c2vectparallelregion * c2pr, int)
3      vcomp110.dll!_vcomp::c2vectparallelregion::parallelcallback_guided(_vcomp::c2vectparallelregion * c2pr=0x002af840)
4      vcomp110.dll!_vcomp::fork_helper_wrapper(void (...) *)
5      vcomp110.dll!_vcomp::parallelregion::handlerthreadfunc(void * context=0x002af7dc, unsigned long index=0x00000000)
6      vcomp110.dll!invokethreadteam(_thread_team * ptm=0x002dddd8, void (void *, unsigned long) * pvcontext=0x002af7dc, void *)
7      vcomp110.dll!_vcomp_fork(int if_test=0x00000001, int arg_count=0x00000001, void (...) * funclet=0x0f941d54, ...)
8      vcomp110.dll!_vcomp::c2vectparallelregion::execute()
9      vcomp110.dll!c2vectparallel(int start=0x00000003, int end=0x000186a0, int stride=0x00000001, int inclusive=0x00000000, unsigned int numchunks=0x00000004, int schedule=0x00000003, void (int, int, ...) * func=0x012018d0, int argcnt, ...)
10     demo.exe!wmain(int argc=0x00000001, wchar_t * * argv=0x002dc178)
11     demo.exe!__tmaincrtstartup()
12     kernel32.dll!@basethreadinitthunk@12()
13     ntdll.dll!___rtluserthreadstart@8()
14     ntdll.dll!__rtluserthreadstart@8()

>debug.listthreads
index id     name                           location
--------------------------------------------------------------------------------
*1     2340   main thread                    wmain$par$1
2     8084   vcomp110.dll!_vcomp::persistentthreadfunc _rtluserthreadstart@8
3     2444   vcomp110.dll!_vcomp::persistentthreadfunc @rtlpallocateheap@24
4     6764   vcomp110.dll!_vcomp::persistentthreadfunc _rtluserthreadstart@8 
>

the documentation now is much better than it was in the beta, and you can find online more details about the /qpar compiler switch and the parallelization #pragmas .

PRIME (PLC) Documentation Machine Directive (programming) Desktop (word processor) Form (document) Property (programming)

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • 10 Steps to Become an Outstanding Java Developer
  • How to Optimize MySQL Queries for Speed and Performance
  • How to Upload/Download a File To and From the Server
  • Choosing Between REST and GraphQL

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo