The Fallacies of Enterprise Computing (Part 1)

Any enterprise system is subject to the same fallacies as any other distributed system. Reliability, latency, bandwidth, security, the whole nine yards.

Ted Neward

Sep. 01, 16 · Opinion

Likes (9)

Comment

Save

4.9K Views

More than a decade ago, I published Effective Enterprise Java, and in the opening chapter ,I talked about the Ten Fallacies of Enterprise Computing, essentially an extension/add-on to Peter Deutsch’s Fallacies of Distributed Computing. But in the ten-plus years since, I’ve had time to think about it, and now I’m convinced that Enterprise Fallacies are a different list. Now, with the rise of cloud computing stepping in to complement, supplment, or replace entirely the on-premise enterprise data center, it seemed reasonable to get back to it.

I’ll expand on the items in the list over future blog posts, I imagine, but without further ado, here’s the Fallacies of Enterprise Computing.

New technology is always better than old technology.
Enterprise systems are not “distributed systems.”
Business logic can and should be centralized.
Data, object or any other kind of model can be centralized.
The system is monolithic.
The system is finished.
Vendors can make problems go away.
Enterprise architecture is the same everywhere.
Developers need only worry about development problems.

As Deutsch said, “Essentially everyone, when they first build a[n enterprise] system, makes the following [nine] assumptions. All prove to be false in the long run and all cause big trouble and painful learning experiences.”

Naturally, I welcome discussion around these, and I may edit and/or append to this list as time goes by, but this is where the past decade has led me.

New Tech Is Always Better than Old Tech

After building IT systems for more than sixty years, one would think we as an industry would have learned that “newer is not always better”. Unfortunately, this is a highly youth-centric industry, and the young have this tendency to assume that anything new to them is also new to everybody else. And if it’s new, it’s exciting, and if it’s exciting, it must be good, right? And therefore, we must throw away all the old, and replace it with the new.

This cannot be emphasized enough: This is fallacious, idiotic, stupid, and brain-dead.

This fallacy is an extension of the old economic “limited market” fallacy: The more gains one entity makes in a market, the more that other entities lose. (Essentially, it suggests that the market is intrinsically a zero-sum game, despite obvious evidence that markets have grown substantially even in just the last hundred years since we started tracking economics as a science.) Thus, for example, if the cloud is new, and it has some advantages over its “competitors”, then every “win” for the cloud must mean an equal “loss” for the alternatives (such as on-prem computing). Never mind that the cloud solves different problems than on-prem computing, or that not everything can be solved using the cloud (such as computing when connections to the Internet are spotty, nonexistent, or worse, extremely slow).

Now, for those of you who have been engaged in the industry for more than just the past half-decade, here’s the $65,535 question for you: How is “the cloud” any different from “the mainframe”, albeit much, much faster and with much, much greater storage?

Those who cannot remember the past are condemned to repeat it. —George Santanyana, Historian

I’ve seen this play out over and over again, starting with my own entry into the IT universe with C++ (which was the “new” over C), and participated in a few system rewrites to C++ from other things (Visual Basic being one, C being another, sometimes some specific vertical stuff as well). Then I saw it again when Java came around, and companies immediately started rewriting some of their C++ systems into Java. This time around, I started to ask, “Why?”, and more often than not, answers of “We don’t want to fall too far behind” or “We need to modernize our software” were the fairly vague answers. (When pressed as to why “falling behind” was bad, or why software needed to be modernized, I was usually shushed and told not to worry about it.)

In the years since, I keep thinking that companies have started to get this message more thoroughly, but then something comes along and completely disrupts any and all lessons we might have learned. After Java, it was Ruby. Or, for those companies that didn’t bite on the Java apple, it was .NET. Now NodeJS. Or NoSQL. Or “cloud”. Or functional programming. Or take your pick of any of another half-dozen things.

Unfortunately, as much as I wish I could believe that “it’s different this time” and we as an industry have learned our way through this, I keep seeing signs that no, unfortunately, that’s too much to hope for. The easy way to mitigate this fallacy is to force those advocating new technology to enumerate the benefits in concrete terms — monetary and/or temporal benefits, ideally, backed by examples and objective analysis of pros and cons.

By the way, for those who aren’t sure if they can spot the fallacy, the easy way to tell if somebody is falling into this fallacious trap is to see if their analysis contains both positive and negative consequences. No technology is never without its negatives, and a practical and objective analysis will point it out. If it’s you doing the analysis, then force yourself to ask the question, “When would I not use this? What circumstances would lead me away from it? When is using this going to lead to more pain than it’s worth?”

Enterprise Systems Are not “Distributed Systems”

This means, simply, that any enterprise system is subject to the same fallacies as any other distributed system. Reliability, latency, bandwidth, security, the whole nine yards (or the whole eight fallacies, if you prefer) are all in play with any enterprise system.

If you’re not familiar with the Eight Fallacies of Distributed Systems, take some time to make yourself familiar with them and some of the mitigation strategies.

Business Logic Can and Should Be Centralized

(Note: I wrote this up a long time ago in a blog post as the “Eleventh Fallacy of Distributed Systems”, but it feels vastly more relevant as an Enterprise Fallacy.)

The reason this is a fallacy is because the term “business logic” is way too nebulous to nail down correctly, and because business logic tends to stretch out across client-, middle- and server- tiers, as well as across the presentation and data access/storage layers.

This is a hard one to swallow, I’ll grant. Consider, for a moment, a simple business rule: a given person’s name can be no longer than 40 characters. It’s a fairly simple rule, and as such should have a fairly simple answer to the question: Where do we enforce this particular rule? Obviously we have a database schema behind the scenes where the data will be stored, and while we could use tables with every column set to be variable-length strings of up to 2000 characters or so (to allow for maximum flexibility in our storage), most developers choose not to. They’ll cite a whole number of different reasons, but the most obvious one is also the most important–by using relational database constraints, the database can act as an automatic enforcer of business rules, such as the one that requires that names be no longer than 40 characters. Any violation of that rule will result in an error from the database.

Right here, right now, we have a violation of the “centralized business logic” rule. Even if the length of a person’s name isn’t what you consider a business rule, what about the rule stating that a person can have zero to one spouses as part of a family unit? That’s obviously a more complicated rule, and usually results in a foreign key constraint on the database in turn. Another business rule enforced within the database.

Perhaps the rules simply need to stay out of the presentation layer, then. But even here we run into problems––how many of you have used a website application where all validation of form data entry happens on the server (instead of in the browser using script), usually one field at a time? This is the main drawback of enforcing presentation-related business rules at the middle- or server-tiers, in that it requires round trips back and forth to carry out. This hurts both performance and scalability of the system over time, yielding a poorer system as a result.

So where, exactly, did we get this fallacy in the first place? We get it from the old-style client/server applications and systems, where all the rules were sort of jumbled together, typically in the code that ran on the client tier. Then, when business logic code needed to change, it required a complete redeploy of the client-side application that ended up costing a fortune in both time and energy, assuming the change could even be done at all–the worst part was when certain elements of code were replicated multiple times all over the system. Changing one meant having to hunt down every place else a particular rule was–or worse, wasn’t–being implemented.

This isn’t to say that trying to make business logic maintainable over time isn’t a good idea––far from it. But much of the driving force behind “centralize your business logic” was really a shrouded cry for “The Once and Only Once Rule” or the “Don’t Repeat Yourself” principle. In of themselves, they’re good rules of thumb. The problem is that we just lost sight of the forest for the trees, and ended up trying to obey the letter of the law, rather than its spirit and intentions. Where possible, centralize, but don’t take additional costs beyond the benefits of doing so.

By the way, one place where the “centralize only if it’s convenient” rule has to be set aside is around validating inputs from foreign locations—in other words, any data which is passed across the wire or comes in from outside the local codebase. In order to avoid security vulnerabilities, data should always be verified as soon as it reaches your own shores, even if that means duplicating it in every foreign-accessible interface.

Models Can Be Centralized

As tempting as it is to create “one domain model to rule them all”, particularly given all the love for Domain-Driven Design in the past ten years or so. A similar corollary to the “one domain model” is the “one database model”—at some point in the enterprise IT manager’s tenure, somebody (usually a data architect or consultant) will suggest that massive savings (of one form or another) can be had for the taking if the company takes the time to create a unified database. In other words, bring all the different scattered databases together under one roof, centralized in one model, and all the data-integration problems (data feeds into databases, ETL processes, and so on) will be a thing of the past as every single codebase now accesses the Grand Unified Data Model.

I have never seen one of these projects ever actually ship. Other architects have told me that they’ve had them ship, but when I follow up with people who’ve been at said companies, the universal story I hear is that once built, the resulting model was so complex and unwieldy that within a short period of time (usually measured in months) it was abandoned and/or fractured into smaller pieces so as to be usable.

The problem here is that different parts of the enterprise care about different aspects of a given “entity”. Consider the ubiquitous “Person” type, which is almost always one of the first built in the unified model. Sales cares about the Person’s sales history, Marketing cares about their demographic data (age, sex, location, etc), HR cares about their company-related information (position, department, salary, benefits status, etc), and Fulfillment (the department that ships your order once purchased) cares about address, credit card information, and the actual order placed.

Now, obviously, trying to keep all of this in one Person entity (the so-called “fat” entity, since it has everything that any possible department could want from it) is going to be problematic over time—if nothing else, fetching a list of all of the Persons from the system for a dropdown will result in downloading orders of magnitude more data than actually required. (This also runs afoul of the “Bandwidth is inifite” and “Latency is zero” and “Transport cost is zero” fallacies of Distributed Systems.) Clients will quickly start caching off only the parts they care about, and the centralized data model is essentially decentralized again.

The next reasonable step is to split Person up into “derived” models, usually (in the relational sense) by creating subsidiary tables for each of the specific parts. This is reasonable, assuming that the cost of doing joins (in the relational sense) across the tables is acceptable. Unfortunately, these sorts of centralized data models are usually supposed to hold the entirety of the enterprise’s data in one database, so the costs of doing joins across millions of rows in multiple tables is often prohibitive. But let’s leave that alone for a moment.

Where things really start to go awry is that enterprise systems are never monolithic (see the next fallacy), and the code that accesses the centralized data model often needs to be modified in response to “local” concerns; for example, HR may suddenly require that “names” (which are common to the Person core table) be able to support internationalization, but Marketing is right in the middle of an important campaign, and any system downtime or changes to their codebase are totally unacceptable. Suddenly we have a political tug-of-war between two departments over who “owns” the schedule for updates, and at this point, the problem is no longer a technical problem whatsoever. (This is the same problem that sank most centralized distributed systems, too—any changes to the shared IDL or WSDL or Schema have to ratified and “bought off” by all parties involved.)

Where this falls apart for domain models is right at the edge of the language barrier—a domain model in the traditional DDD sense simply cannot be shared across language boundaries, no matter how anemic. Classes written in C# are not accessible to Java except through tools that will do some form of language translation for local compilation, and these will almost always lose any behavior along the way — only the data types of the fields will be brought along. Which sort of defeats half the point of a Rich Domain Model.

And that's where we'll stop for now. But rest assured, we'll go through the rest of the list in part two, when we deal with the assumption of monolithic architecture, over-reliance on vendors, and plenty of other problems.

Relational database Database Fallacy Computing Data architect Business logic IT Cloud computing

Published at DZone with permission of Ted Neward, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending