DZone recently had a chance to sit down with Joe Duffy, author of 'Concurrent Programming on Windows'. In this interview, Joe discusses the origins and benefits of concurrent development in a Windows environment, and the future of parallel programming with the .NET Framework.
DZone - Tell us a little about what you do at Microsoft?
Joe Duffy - I have two primary roles at Microsoft. First, I manage the team of developers who are creating the Parallel Extensions technology, a collection of new libraries which will appear in .NET 4.0 (aka Visual Studio 2010). That includes Parallel LINQ (PLINQ) and the Task Parallel Library (TPL). And second, I am the architect for those and related technologies, like exploring concurrency safe type system extensions to the C# language. I write plenty of code too.
DZone - What motivated you to write this book?
Joe - My prior job was the concurrency program manager for the CLR team. In that role, I was responsible for the System.Threading namespace and corresponding runtime support, and also managed a few projects related to what eventually became Parallel Extensions (and some that is still in the works, like software transactional memory). There simply was no good book on concurrency for Windows or .NET available at the time. (And that didn’t change in the 2 ½ years I worked on this book.) So I decided that, in addition to learning the topic inside and out, I should write it down for others to learn from too.
DZone - What is Concurrent (or Parallel) Programming? Give us the nickel tour.
Joe - Concurrency is everywhere, and (put as simply as possible) is the act of running multiple things at once. On Windows, those things are threads. Parallel programming is the act of using concurrency to utilize more hardware at once for a performance gain. Concurrency has existed in operating systems and server-side programs, and parallelism has been common to specialized domains like high-performance- and scientific-computing, for decades. Only recently has parallelism become forefront in client-side applications. Indeed it’s now a crucial skill for all developers to have in their arsenal.
DZone - What are some of the hardest problems that developers are trying to solve when it comes to concurrent programming with .NET?
Joe - The three hardest problems are, in no particular order: (1) finding opportunities for parallelism, (2) arranging for parallelism to occur, and (3) dealing with the consequences of concurrency and shared state. The third category is perhaps the best known, and conjures up words like race condition, deadlock, and priority inversion. The lack of good books and abstractions in the platform has made these problems even more difficult for developers to deal with.
DZone - What kinds of features are currently available in the .NET Framework 3.x to support concurrent programming?
Joe - Windows has built up a solid foundation for concurrent programming over the 15-or-so years since Windows NT 3.5. Each release has come with new improvements, ranging from new APIs—like the wait chain traversal and user-mode reader/writer locks in Windows Vista—to subtle and often-unnoticed improvements—like the use of keyed events for low resource conditions and the elimination of the single global scheduler lock in the kernel. .NET builds atop the OS support, and thus benefits also. But it brings along its own improvements—like its own new reader/writer lock and an overhaul of the .NET threadpool from release 3.0 to 3.5. The improvements coming in .NET 4.0 are perhaps the most substantial to date.
DZone - What are some of the biggest enhancements coming in .NET 4.0 in this space?
Joe - TPL provides a new scheduler that uses work-stealing to eliminate scaling bottlenecks and make it far easier to benefit from fine-grained parallelism. It also offers a rich set of abstractions—like tasks with cancellation and parent/child relationships, parallel for loops, easy divide-and-conquer support, and more. PLINQ runs any LINQ query in parallel, taking advantage of LINQ’s mostly-functional nature. A set of other types, the Coordination Data Structures (CDS), accommodates other synchronization needs, including a blocking/bounded collection type, a unified cancellation framework, and a rich set of concurrent collections. There are too many specifics to list here. I recommend folks check out http://msdn.microsoft.com/concurrency/ to read some articles and download our preview releases. My book also has an appendix describing the preview bits.
DZone - How is Microsoft leveraging parallel programming internally?
Joe - Multicore is still in its infancy. We stand at the beginning of the long adoption curve, and my team has had to deal with the chicken-and-the-egg problem. That said, I just got an email last week from an internal team that used Parallel Extensions to speed up the install time of their product by 4x on 4-core machines. I get emails like this describing one-off successes all the time. Many product teams are still experimenting with the best way to incorporate parallelism and how to use it to improve their software’s capabilities. Other teams—particularly in Microsoft Research—are using parallelism to perform increasingly sophisticated analyses on increasingly large sets of data, and tackling some pretty epic problems. Some groups have even dedicated entire teams to exploring how parallelism can help to improve applications with immersive capabilities like speech and vision. I personally believe that we'll soon be leaving the “toying around” phase of multicore and entering the phase where parallelism is used for true competitive advantages in the next 1-3 years.
DZone - Are there any best practices people can follow?
Joe - The book lists many, and in fact has a consolidated list in one of the appendices. If I could boil them down into a brief sound bite, however, it would be: program functionally where possible; avoid sharing (aka isolate state) where functional isn’t possible (or performant); and tread with great care when you need to mutate shared state (as not only is it dangerous, but it will negatively impact the performance benefits parallelism can bring). Languages like F# can help folks attain this vision, but there is no silver bullet. Awareness and thoughtfulness can go a long way.
DZone - Are there any interesting case studies that you can discuss? Have you implemented a recent project using concurrent programming?
Joe - I talk to customers all the time who are trying to retrofit parallelism into their existing applications. Most successes I’ve heard from have been around so-called embarrassingly parallel problems. These are those where either computations are large, data sets are large, or some combination of the two, and where the processing that goes on is easily divisible. Graphics applications, game engines, CAD systems, financial modeling, advanced mathematical problems (like constraint satisfaction), business intelligence and reporting, and certain AI search algorithms are just a few examples.
DZone - Who should buy your book?
Joe - The book is meant for any .NET or C++ developer whose software needs to run on Windows. I don’t think many more caveats are needed. I fully believe that all programmers will need to know how to deal with threading in order to remain marketable 5 years out. My book covers both client- and server-side concurrency, and isn't only meant for the new kinds of parallel programming that apply to the multicore era. It really can be used for the kind of software that's already being written today, in addition to being applicable for the software of tomorrow. Most of all, the book is meant for geeks who (like me) aren’t satisfied until they understand the ins and outs of how everything most people take for granted actually works under the covers. If you like books like Windows Internals, then this book is for you.
DZone - Do you have any future books in the works?
Joe - You bet. My next book is called “Notation and Thought: Behind Computer Science’s Most Influential Programming Languages.” I’ve always had a fascination with programming languages, the history behind them, and the people who created them. This book is an exploration of that, telling the story behind 20-something of the most influential languages. I believe we’re standing on the edge of a programming language renaissance. Too much of the computing landscape is undergoing major changes for the languages we use to remain constant. You can already see trends towards dynamic and function languages bucking the standard statically-typed, C language trend that dominated the 80’s and 90’s, and domain specific languages are on the rise. And interestingly, the multicore era demands new language constructs to enable safe and deterministic programming in the face of large-scale concurrency. This is proving to be an incredibly fun book to write.