For quite a while, I was a proponent of isolating business logic from thread syncronization. With time, it became so obvious to me, that I've started to assume that everybody else out there shares this feeling. Of course, as any other assumption, it proven to be wrong: when I've wrote a piece about Store-Process-and-Forward architecture, this very claim "don't mix business logic with thread synchronization" has caused quite a confusion when the article was discussed on Reddit. To address this all-important issue, I will try to elaborate on it here.
So, the question which we'll discuss today, is the following: why it is usually a Bad Idea to have thread synchronization addressed at business logic level?
When speaking about thread synchronization, we won't consider trivial stuff, such as "lock everything under one single mutex" - while this is technically solid approach which is rather easy to implement, it doesn't really address a main reason for multi-threading (using multiple cores to achieve scalability), and is pretty much useless for achieving scalability.
In addition, we won't consider frameworks which allow for more or less clean separation between business logic and threads (for example, via queues, as in Erlang/OTP or Store-Process-and-Forward architecture). Such frameworks do allow to avoid most of the problems discussed in this article, and we'll discuss them briefly in "What is to be done?" section.
Now, as we've excluded some of the simpler synchronization ways out of scope, let's see what are the problems with having more complicated thread sync intermingled with business logic.
Reason 1. Cognitive Capabilities of Human Brain.
As early as in 1956, it has been recognized that cognitive capabilities of human brain are quite limited . As Wikipedia puts it: "the number of objects an average human can hold in working memory is 7 ± 2." Let's see what it means when we're adding thread sync to business logic.
Business logic as such tends to reach this "7 ± 2" limit on entities developer needs to deal with, even without synchronization. And when we're adding non-trivial thread sync to the mix, developer needs to think not only about those 7 business entities, but also about things like "what will happen if entity A interacts with entity B"; it effectively means that number of entities doesn't grow linearly due to this addition, but rather as a square (!). Therefore, if we had 7 entities to deal with within business logic itself, then after adding thread sync we can easily end up with around 40-50 entities, which are well beyond capabilities of even advanced developers.
Or from a different perspective - if we have our developer with her own cognitive capabilities which allow her to handle N entities, then, regardless of specific number N being 5 or 50, forcing her to handle thread sync will severely restrict complexity of business logic she is able to handle.
Reason 2. Reliability Problems.
Thread sync is a very nasty thing. In particular, if we're going beyond the very simple synchronization models (such as those we've excluded from consideration in Scope section above), they tend to be non-testable. Just one example [C++Report98] describes a multi-threaded bug (causing no less than a deadlock) in MSVC implementation of std, which bug has stayed there for years, and was found only via code review, and a special program written to demonstrate it, has deadlocked in several seconds (!). In practice, it has been further reported that a typical pattern of real-world manifestation of the bug was "it deadlocked about once a month on a client machine".
The above demonstrates how elusive and un-testable thread bugs are. But this, in turn, inevitably causes shipped/deployed software to be less reliable. In addition to this obvious logical consequence, it has been observed in practice that systems written without application-level thread-sync, were referred by auditors as having "too good to be true" downtimes. I tend to treat it as an additional compliment to avoiding thread sync at business logic level.
Reason 3. Code Fragility and Maintenance Costs
With multithreaded coding being so much more complicated, and multithread bugs being so elusive, it inevitably leads to the code-with-thread-sync being very fragile. A very small change, such as very innocent swap of two seemingly non-interacting lines of code, may cause a multi-threaded bug to be introduced (especially if you're relying on memory fences), or a previously-existing-but-not-manifested bug to start manifesting itself.
This fragility, when combined with inevitable frequent changes to the business logic, makes code maintenance extremely expensive. In practice, it has been observed that starting from a certain point, developers were very reluctant to make any changes to seemingly-working business-logic-mutithreaded code, as they didn't understand what kind of race the change they're making, could cause after deploying the system to the field.
Strike Three. Intermix of business logic with thread sync is out.
What is to be done?
Ok, so we've established that intermixing of business logic with thread synchronization is a Bad Thing. But very often there exists a business requirement to have scalability, so how we're are going to achieve it? The answer is an centuries-old "divide and conquer" approach: you have two layers, one is "business logic", and another is "infrastructure", with very-well-defined interfaces between the two. When writing business logic, all synchronization is limited to this very-well-defined interface, so thread sync is not in scope. When writing infrastructure layer, we do need to care about thread sync, but we don't need to care about business logic; as a result, for this infrastructure level: (a) complexity doesn't involve business logic, so cognitive issues are not that bad; (b) correctness can be proven rather than tested (yes, I've done it myself), and (c) changes are rare, so maintenance becomes much less of an issue.
I know of two practical approaches to such a separation of business logic from infrastructure layer. One is Erlang/OTP, another is Store-Process-and-Forward architecture (in fact, they're quite close to each other in terms of ideology). While discussing them is beyond the scope of present article, each of them does provide a way to avoid most of those negative effects described above.
[C++Report98] Sergey Ignatchenko, STL Implementations and Thread Safety, C++ Report, Jul/Aug 1998