Visual Studio 2012 C++ Auto-Parallelizer
Join the DZone community and get the full member experience.
Join For Freeas you might have gathered from some scarce reports on the web and the initial list of new features in visual studio 2012, the new c++ compiler is now capable of automatically vectorizing loop bodies—a feature i’ve already covered here, and also automatically parallelizing them using multiple threads.
here’s an example. consider the classic prime number calculation loop, designed to count the number of primes in a given range:
__declspec(noinline) bool is_prime(int n) { for (int x = 2; x < n; ++x) { if (n % x == 0 && n != x) return false; } return true; } long count = 0; for (int i = 3; i < n; ++i) { if (is_prime(i)) { ++count; } } printf(“count = %d"\n”, count);
this is a classic, ripe candidate for parallelization—although we need to be a little careful with the shared count variable. with n=100000 the loop completes in ~1600ms on my desktop; perhaps the compiler can make it faster automatically.
we go ahead and enable the /qpar switch in the project properties. this allows the c++ compiler to perform automatic parallelization, but it still sometimes requires an explicit hint regarding the loops that might benefit from parallelization.
this hint is given in the form of a #pragma, indicating also how many threads you recommend that the runtime should use:
#pragma loop(hint_parallel(4)) for (int i = 3; i < n; ++i) { if (is_prime(i)) { ++count; } }
this still takes ~1600ms on my machine, and no parallelization is visible. what’s wrong? the shared variable, of course. the compiler notices that it would be unsafe to parallelize the loop body and refrains from doing it. changing the loop to…
#pragma loop(hint_parallel(4)) for (int i = 3; i < n; ++i) { if (is_prime(i)) { interlockedincrement(&count); } }
…suddenly works, and brings down the time to ~450ms. here are the four threads and a representative call stack, showing that the underlying engine is the same as in openmp (with its #pragma omp directives introduced in visual studio 2005!):
>debug.listcallstack index function -------------------------------------------------------------------------------- *1 parallelizingcompilercpp.exe!wmain$par$1() 2 vcomp110.dll!_vcomp::c2vectparallelregion::serialcallback(_vcomp::c2vectparallelregion * c2pr, int) 3 vcomp110.dll!_vcomp::c2vectparallelregion::parallelcallback_guided(_vcomp::c2vectparallelregion * c2pr=0x002af840) 4 vcomp110.dll!_vcomp::fork_helper_wrapper(void (...) *) 5 vcomp110.dll!_vcomp::parallelregion::handlerthreadfunc(void * context=0x002af7dc, unsigned long index=0x00000000) 6 vcomp110.dll!invokethreadteam(_thread_team * ptm=0x002dddd8, void (void *, unsigned long) * pvcontext=0x002af7dc, void *) 7 vcomp110.dll!_vcomp_fork(int if_test=0x00000001, int arg_count=0x00000001, void (...) * funclet=0x0f941d54, ...) 8 vcomp110.dll!_vcomp::c2vectparallelregion::execute() 9 vcomp110.dll!c2vectparallel(int start=0x00000003, int end=0x000186a0, int stride=0x00000001, int inclusive=0x00000000, unsigned int numchunks=0x00000004, int schedule=0x00000003, void (int, int, ...) * func=0x012018d0, int argcnt, ...) 10 demo.exe!wmain(int argc=0x00000001, wchar_t * * argv=0x002dc178) 11 demo.exe!__tmaincrtstartup() 12 kernel32.dll!@basethreadinitthunk@12() 13 ntdll.dll!___rtluserthreadstart@8() 14 ntdll.dll!__rtluserthreadstart@8() >debug.listthreads index id name location -------------------------------------------------------------------------------- *1 2340 main thread wmain$par$1 2 8084 vcomp110.dll!_vcomp::persistentthreadfunc _rtluserthreadstart@8 3 2444 vcomp110.dll!_vcomp::persistentthreadfunc @rtlpallocateheap@24 4 6764 vcomp110.dll!_vcomp::persistentthreadfunc _rtluserthreadstart@8 >
the documentation now is much better than it was in the beta, and you can find online more details about the /qpar compiler switch and the parallelization #pragmas .
Opinions expressed by DZone contributors are their own.
Comments