From Doug Lea:
Highlights: 1. Substantially better throughput when lots of clients submit lots of tasks. (I've measured up to 60X speedups on microbenchmarks). The idea is to treat external submitters in a similar way as workers -- using randomized queuing and stealing. (This required a big internal refactoring to disassociate work queues and workers.) This also greatly improves throughput when all tasks are async and submitted to the pool rather than forked, which becomes a reasonable way to structure actor frameworks, as well as many plain services that you might otherwise use ThreadPoolExecutor for. These improvements also lead to a less hostile stance about submitting possibly-blocking tasks. An added parag in the ForkJoinTask documentation provides some guidance (basically: we like them if they are small (even if numerous) and don't have dependencies). http://gee.cs.oswego.edu/dl/jsr166/dist/docs/java/util/concurrent/ForkJoinTask.html 2. More cases are handled that allow threads to help others rather than generating compensation threads. Including most cases of naive backward joins. (I'm not sure whether it is a bug or a feature that some such cases are now only about twice as slow as structuring joins correctly). 3. One small API addition: Explicit support for task marking. It was cruel to tell people that they could use FJ for things like graph traversal but not have a simple way to mark tasks so they won't be revisited while processing a graph (among a few other common use cases). Because they weren't supported initially, marking methods need crummy names that won't conflict with existing usages: markForkJoinTask and isMarkedForkJoinTask. 4. Better tolerance for GC/allocation stalls: It's not uncommon for a "lead" task to stall producing subtasks because of GC, causing others to give and block, requiring expensive unblocking when it finally resumes. A new slower ramp-down scheme reduces performance impact. (Although still, the best guidance is to remember Amdahl's law, and minimize the sequential overhead needed to produce a task). 5. Other minor changes that give a few percent improvement in common FJ task processing. On the other hand, this version is even more prone to GC cardmark contention. So if using hotspot on a multiprocessor (or even >4core multicore) you absolutely must run in -XX:UseCondCardMark or -XX:+UseG1GC. (Also, it is better behaved with biased locking disabled -XX:-UseBiasedLocking).
Doug is open to any and all feedback of course.
Wikipedia: Fork/Join queue