How AMD's Heterogeneous Systems Architecture Works, and How to Learn More
(This article is the second in a two-part series leading up to the AMD Fusion Developer Summit, the only developer conference dedicated specifically to heterogeneous computing. Check out the first article for a conceptual overview, with extensive resource links.)
Recently Anand Lai Shimpi hosted a community Q&A with Manju Hegde, Corporate VP of Heterogeneous Applications and Developer Solutions at AMD.
The topic: Heterogeneous Systems Architecture, the standards-based, AMD-led effort to ease development of heterogeneous systems, especially CPU+GPU systems.
Normally I'd just send you over to that most excellent Q&A -- but in this case the questions are so good, and Manju's answers so thorough, that you might not have a chance to read everything. So here's a detailed summary, with links to more in-depth resources:
- Differences between Fusion and HSA:
- Fusion: let developers use GPU along with CPU
- HSA: make the GPU a first-class programmable processor
- Specific HSA improvements:
- C++ support for GPU computing
- All system memory accessible by both CPU and GPU
- Unified address space (hence no separate CPU/GPU memory pointers)
- GPU uses pageable system memory (hence accesses data directly in CPU domain)
- GPU and CPU can reference caches of both
- GPU tasks are context-switchable (esp. important to avoid touch interface lag -- contexts switch rapidly in heterogeneous environments)
- (GP)GPU versatility: Non-UI use of the GPU is currently active at a basic level in security, voice recognition, face detection, biometrics, gesture recognition, authentication, and database functionality. But each task is currently GPU-routed. HSA will make GPU use in all these non-UI domains much easier in the next few years.
- C++ AMP and HSA: C++ Accelerated Massive Parallelism (AMP) is the Microsoft alternative to OpenCL. Both are excellent, and will fill similar roles within the larger HSA. Because C++ AMP does not represent a huge departure from C++, the AMP development learning curve will be relatively shallow.
- Gaming vs. compute performance: GPU architecture and production costs mean that there is usually an inverse performance relationship between gaming and pure compute performance. This means, in turn, that desktop (i.e., non-specialized) GPU design involves a careful balancing-act between gaming and compute performance (see this paper for a technical overview of some reasons why -- it's more than just GPUs' excellent floating-point performance).
- AMD and developers: In the past, AMD tended to engineer products, and stop there. Now, because HSA involves a much more serious attempt to encourage heterogeneous systems development, AMD will be working more closely with developers to help them take advantage of (especially GPU) powers they might not have been able to use in the past.
- The advance of the APU: AMD has no grand strategy to promote APUs, even though they already make numerous different kinds of APUs. Every APU is designed as a response to a specific use-case.
- The advance of OpenCL: AMD is deeply interested in strengthening OpenCL itself, and to that end has recently driven these OpenCL initiatives:
- improved debugger and profiler: Visual Studio blogun, standalone Eclipse, Linux
- static C++ interface
- extended tools by close collaboration with MulticoreWare (PPA, GMAC, TM)
- OpenCL book and programming guide
- university course kit (for use with aforementioned book and programming guide)
- self-training material online
- hands-on tutorials at the Developer Summit (select 'Hands On Lab' under 'Session Type')
- moderated OpenCL forum
- OpenCL training and service partners
- OpenCL acceleration of major open-source codebases
- Aparapi to make Java coders use OpenCL more easily
- The continuing (but receding) importance of device-specific GPU optimization: Roughly speaking, as GPUs become more General Purpose (GPGPU), the need to optimize for specific GPUs will approach the (real but relatively low) need to optimize for specific CPUs.
- The CPU-GPU bottleneck (or, whether to use PCIe 3.0 or on-die CPU/GPU integration): The impact of the bottleneck depends hugely on the algorithm.
- The problem of GPU physics: Simple techniques (resolution, antialiasing, texture resolution) scale graphics easily across many levels of hardware capability -- and this is how game developers have used GPUs in the past. Physics does not scale across hardware nearly as easily, so most developers handle GPU physics at the lowest (console) level. But HSA will make cross-hardware physics scaling much easier.
- HSA's benefits to small but parallel workloads (versus earlier GPGPU acceleration, which had disproportionately large effect on workloads with lots of data): HSA does not require cache flushing and copying between CPU and GPU, so the quantity of data shared matters much less than previous GPGPU acceleration attempts.
- HSA availability and AMD's long-term commitment to developers taking advantage of heterogeneous computing: AMD will continue to hold Fusion Developer Summits annually; is already partnering with Adobe, Cloudera, Penguin Computing, Gaikai, and SRS, and working closely with Sony, Adobe, Arcsoft, Winzip, Cyberlink, Corel, Roxio, and many more; and will continute to help make OpenCL development much easier. But the open-standard HSA is where AMD's major, highly ambitious effort in heterogeneous computing will lie, beginning in 2013-2014.
- HSA and HPC (high-performance computing): AMD is designing HSA-based APUs for both consumer and HPC markets. Penguin Computing will explain some of their HPC applications in detail during the upcoming Fusion Developer Summit (June 11-14).
- How software stacks will catch up with heterogeneous hardware: The HSA Intermediate Layer (HSAIL) will help facilitate this by insulating software stacks from individual ISAs.
- Why use graphics shading languages (OpenCL, DirectX) at all: Radical change must be evolutionary, not revolutionary (e.g., assembly -> C -> C++ -> Java). Existing codebases must be used effectively, not abandoned for code written in a theoretically perfect language (the 'software side' of heterogeneous computing). HSA is designed to help developers take advantage of their own skills and existing codebases at the same time.
As several of these questions noted, the annual AMD Fusion Developer Summit is an essential component in the eventual rollout of the open-standard Heterogeneous Systems Architecture.
No other conference covers heterogeneous computing specifically. The track list is amazingly broad, and the schedule incredibly ambitious. To GPGPU-wrestlers and non-wrestlers alike, heterogeneous computing is a thrilling, emerging technology.
AMD generously offered AnandTech readers a chance to attend AFDS for free, so I asked around and they were kind enough to do the same for DZone readers. Just enter the promo code DZONE12 when registering for complementary registration (normally $495), but hurry because the code is only good for the first 50 registrants.
Learn more and consider attending the conference on June 11-14.