Sponsored Post
While C++ is still one of the most popular programming languages, even the latest versions have very limited features to fully utilize the parallel processing capabilities of current multi- and many-core processors. So vendors and academic groups have resorted to inventing specifications and software techniques outside the C++ language to support parallelism, including OpenMP*, Intel TBB, Cilk*, OpenCL*, OpenACC*, etc.
But this situation is about to change with the upcoming version of the C++ standard, C++17, which introduces a parallelized version of the Standard Template Library (STL).
Parallel STL should now make it possible to transform existing sequential C++ code to take advantage of the threading and vectorization capabilities of modern hardware architectures. Parallel STL extends the C++ Standard Template Library by adding an execution policy argument to specify the degree of threading and vectorization for each algorithm. [clickToTweet tweet=”C++ Parallel STL transforms sequential code to take advantage of modern threading and vectorization capabilities.” quote=”C++ Parallel STL transforms sequential code to take advantage of modern threading and vectorization capabilities.”]
There are four execution policies that can be specified when invoking a Parallel STL algorithm:
- sequenced_policy (seq) requires that an algorithm’s execution may not be parallelized.
- unsequenced_policy (unseq) indicates that an algorithm’s execution may be vectorized but not parallelized. This policy requires that all functions provided are SIMD safe.
- parallel_policy (par) indicates that an algorithm’s execution may be parallelized. Any user-specified functions invoked during the execution should not contain data races.
- parallel_unsequenced_policy (par_unseq) suggests that execution may be parallelized and vectorized.
(A proposed fifth execution policy, vector_policy (vec) would indicate that an execution may be vectorized in a way that preserves forward dependency between elements.)
By offering parallelized versions of most of the STL algorithms, Parallel STL is intended to be an intuitive way for C++ developers to address the most common best practice for parallelizing loops: vectorized innermost, parallelized outermost. Where you may see the best performance is by using Intel® Threading Building Blocks (Intel TBB) constructs to express the outermost parallelism (tbb::parallel_for
) and to invoke Parallel STL algorithms at inner and innermost levels. In many cases this interoperability between Intel TBB and Parallel STL should result in executing code that fully utilizes all of the CPU core.
An initial implementation of Parallel STL is now available as part of the Intel® Parallel Studio XE 2018 Beta, and supports both parallel and vectorized algorithms on the latest Intel processors. It does this by using an available implementation of the C++ standard library for sequential execution, Intel TBB for parallelism with par and par_unseq execution policies, and OpenMP* vectorization for unseq and par_unseq policies. With Parallel STL support from Parallel Studio tools like Intel® VTuneTM Amplifier XE, developers can measure program performance and direct their attention to optimizing those loops that represent the greatest potential performance gain.
Parallel STL advances the evolution of C++, adding vectorization and parallelization capabilities without resorting to nonstandard or proprietary extensions, and leading to code modernization and the development of new applications on modern architectures.
But you’ll have to wait for the 2018 release of Intel Parallel Studio XE to use parallel STL. Until then, download your free 30-day trial of Intel® Parallel Studio XE 2017.