Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Data Boundaries Are the Root Cause of Maintenance Problems

DZone 's Guide to

Data Boundaries Are the Root Cause of Maintenance Problems

It's time to knock down some of the data walls in data-oriented architecture design patterns.

· Performance Zone ·
Free Resource

Many designs and patterns old and new, like the Layered ArchitectureClean ArchitectureHexagonal Architecture, DCI, and others introduce data-oriented boundaries inside the application.

Data-oriented boundaries are interfaces between architectural parts that primarily consist of data in the form of "properties" that can be freely accessed either directly, through getter methods, reflection, or some other technical means.

Regardless of how the data is actually accessed or which side defines the interface and which one consumes it, these kinds of boundaries create serious maintenance problems for the software.

Let's take a look into why this happens and what alternatives exist to avoid these problems.

Data-Oriented Boundaries

This is a classic 3-tier design with Data-Transfer Objects as boundaries:

This design became prevalent in the late 90s, when a lot of developers (myself included) transitioned from traditional procedural languages like C, Pascal, and Basic to Java. Sometimes, the DTOs were called Value Objects at the time, but regardless of what they were called, they were the familiar data structures we've grown accustomed to previously. It was a popular approach and easy to understand because it didn't really require a change of mindset.

This was also the time when a lot of applications were still "rich-clients." That means these applications had to be installed on the user's computer; they provided a GUI and connection to some back-end server. The Web (HTML-based front-ends) was still new and we couldn't yet figure out which one will work or survive, so it kind of made sense to separate the front-end a bit from the application. The layered design shown above helped us in case we needed to switch technologies, which actually happened very often. Mostly in one direction, from rich-client to web.

Neither of those reasons exists today, but this design and the underlying theme of "making switching technologies possible" still stuck somehow — which is a problem, because it introduces a great cost.

The Maintenance Problems Start

Let's take a look at this example:

public final class Amount {
   private final BigDecimal value;
   private final Currency currency;

   public Amount(BigDecimal value, Currency currency) {
      this.value = value;
      this.currency = currency;
   }

   public BigDecimal getValue() {
      return value;
   }

   public Currency getCurrency() {
      return currency;
   }
}


When class is returned from some method, both the caller and the callee would have to know all about the Amount, including the attributes value and currency and also what they mean. Both sides have to know all the rules regarding handling these Amount objects, like not adding the value if the currencies differ, not comparing them, etc.

What would happen if we were to change this Amount? Let's assume it turns out that arithmetic with BigDecimals is too slow and we need billions of operations per second. We would like to change the "internals" of the Amount from BigDecimal to Long (which would be in cents for example). Obviously, we could keep the BigDecimal getter to keep the API stable and convert the internal representation to Long, but that would defeat the purpose of our change, because the arithmetic would still happen on BigDecimals. Instead, now we need to track down each usage of that attribute and see how it would be impacted by our change. This is exactly what unmaintainability looks like: you have to manually track down consequences of a change.

Let's take another example. We want to introduce a field which indicates whether the value is in units or 1/100 units (for some currencies this makes some sense). While in the previous example we still at least get some help from our compiler, showing us usages where the type change introduces problems, this change will cause no compilation issues whatsoever. We have to track down usages again, but this time without the explicit help of our development environment, with the additional task to understand how those sites use the Amount to be able to change them accordingly. This is an even worse situation than before.

Please note that this is just one very simple and incomplete example and it already starts to get out of hand.

Why Is This Happening?

The biggest problem with data-oriented interfaces is that they share meaning implicitly. This is not the good kind of sharing either. It means that because the communication is reduced to data, both sides must have the appropriate interpretation for that data, which might include anything from simple things like what values it could have (can it be null?) to complex interrelations between different parts (like the 1/100th flag above).

If both sides must possess this knowledge then it follows that both sides have to change when this interpretation changes. To make it worse, the interpretation is implicitly shared, because there is no way to detect if suddenly other rules or interactions apply to the data, so there will be very limited language and IDE support for implementing the change. The only way to prevent this outcome is to keep this knowledge localized and hidden as much as possible. The above example should look like this:

public final class Amount {
   private final BigDecimal value;
   private final Currency currency;

   public Amount(BigDecimal value, Currency currency) {
      this.value = value;
      this.currency = currency;
   }

   public Amount add(Amount other) {
      if (currency.equals(other.currency)) {
         return new Amount(value.add(other.value), currency);
      }
      throw new IllegalArgumentException("...");
   }

   public boolean lessThan(Amount other) {
      ...
   }
   ...
}


We no longer publish the internal state of the Amount; instead, we provide business-relevant methods that users of this class can use to manipulate Amount objects according to all rules and regulations controlled completely by the Amount. Note that the knowledge of how amounts are added up or compared is described here exclusively and it is impossible to violate these rules now even if the caller doesn't know any of this.

It is also easy to see that the two example changes proposed above, changing BigDecimal to Long or introducing the 1/100th flag attribute would be possible in this design by changing only the Amount class. This is the crucial point of a maintainable design! It is now possible to change the Amount, either change the internals or even introduce new features, without changing the semantics (the meaning) of what the Amount is and what it does.

Don't Forget the UI!

At some point, some things will be shown to the user. For example, the Amount object could be the balance of an Account, which has to be shown on a web interface.

There is nothing special about the UI and the same rules apply here as well. Instead of having a data-oriented interface from the "business layer" to the "presentation," as would be the case in a layered design, the Amount should instead offer the relevant functionality itself. Let's consider what happens if it doesn't do that: what knowledge would have to be passed to the "presentation" layer?

  • The construction of the Amount object, with all its parameters
  • All the attributes the Amount publishes
  • How these attributes relate or influence each other
  • The exact type and way to show each attribute (what kind of display it needs, what length, what precision)
  • It has to be able to evaluate certain conditions, like whether the amount is negative or positive
  • How to ask for an amount from the user. Again, what kind of display to provide, like how many input boxes or selection widgets to display, etc.

It is fair to say the UI has to know everything there is about the Amount to be able to present it and to get it as input from the user. Therefore, any change in the Amount will result in changes in the UI, which means things that change together aren't together, a hallmark of an unmaintainable design.

The solution to this problem is the same as before. The Amount  keeps this knowledge and instead offers the two relevant methods the UI needs to operate (using an imaginary web-framework here) :

public final class Amount {
   private final BigDecimal value;
   private final Currency currency;

   public Amount(BigDecimal value, Currency currency) {
      this.value = value;
      this.currency = currency;
   }
   ...
   public Component display() {
      return new TextView(
         new NumberView(value), currency.display());
   }

   public InputComponent<Amount> displayEditable() {
      return new InputGroup<>(
         new NumberInput(), currency.displayEditable(),
         Amount::new);
   }
}


It is easy to see that in this design none of the knowledge above has to leak to the UI, while still keeping details (like colors and font-sizes, etc.) of the UI away from the Amount.

Real-Life Example: Weld/CDI Project

There are probably a hundred examples of sharing knowledge just in this one project, some simple and visible, some more subtle and complicated. Here are just two simple and easy to explain ones.

In this example, the class WeldFilter defines itself in terms of the data "name" and "pattern." It turns out there are certain rules how these can interact, but this was not built into WeldFilter, instead, some code in a completely unrelated part of the project must do this:

if ((weldFilter.getName() != null
      && weldFilter.getPattern() != null)
   || (weldFilter.getName() == null &&
      weldFilter.getPattern() == null)) {
   throw new IllegalStateException("...");
}
if (weldFilter.getPattern() != null) {
   this.matcher = new PatternMatcher(...);
} else {
   this.matcher = new AntSelectorMatcher(...);
}

Imagine someone tried to change the WeldFilter class, add a parameter/attribute, a boolean flag, or some new options. That person would have to search for all usages of the class and figure out how that would impact this unrelated part of the application. Even worse, the WeldFilter class is in another repository, so this search would likely not turn up the above code, leaving it broken without any indication that it is in fact broken.

Another common example is to push null handling to the user code. The class AbstractMemberProducer offers a getDisposalMethod() getter:

public DisposalMethod<?, ?> getDisposalMethod() {
    return disposalMethod;
}


The caller, however, has to know that this method might return null. So this spreads null checks all over the code. Just because of this one method, there are at least three classes in completely different places that have the same exact code to check first whether the disposal method is there or not:

// From Validator
if (producer.getDisposalMethod() != null) {
   for (InjectionPoint ip : producer.getDisposalMethod().getInjectionPoints()) {
      ...
   } 
}

// From AbstractProcessProducerBean
if (producer.getDisposalMethod() != null) {
   return ...
}

...

Note, that the null problem itself can, of course, be solved technically, by using Optional, or a different language perhaps. However, null is just the symptom here; the problem is that raw object is shared with the user code and with that the user code has to know the semantics of that piece of data.

How Does This Relate to Other Designs?

Although this article refers multiple times to the layered design, it is by far not the only architectural pattern that focuses on technical separation to the detriment of cohesive functionality and localized changes.

The most recent one is Clean Architecture. This architecture builds on the notion that its boundaries exist completely for technical purposes (therefore contain mostly data without behavior), optimizing for changing technologies instead of changing business functions. Here is an analysis of the Clean Architecture code showing just how many changes are required for very simple features.

DCI (Data, Context, Interaction) is a little bit older, but is also built around the idea that data and function should be separated. This is indeed so important for this approach that it is the name itself. Unlike the Clean Architecture, it doesn't do this for technical purposes, but instead it asserts that the data in objects is stable and changes rarely, while the actual algorithms in the objects change often, therefore justifying their separation. This approach results in a lot of data-oriented interfaces.

The Hexagonal Architecture (Ports and Adapters), just like the Clean Architecture, wants a "pure" core logic without any technology and separates technology aspects with the help of "ports" from the core. It introduces artificial (non-business) boundaries inside the application, because it assumes (like the other approaches above) that most modifications to the software modify only the "pure" business-logic and rarely if ever touches the data, API, UI, or database. The "ports" are usually implemented in a data-heavy or data-only way.

Summary

The most current and popular architecture patterns still build heavily on data-oriented interfaces inside the application, using data-only objects, beans, DTOs or other similar means. These data-heavy constructs, however, because they contain raw data without their correct behavior, transfer the responsibility and knowledge to handle them properly to the caller, so both sides have to know the same things.

This sharing of knowledge will be the cause of maintenance problems later in the software's life, for the simple reason that any modifications on either side will likely need a manual re-evaluation of what happens on the other side with the data. This can be a difficult job, since it can mostly be done only by reading the code that uses the data, which escalates really quickly if there are potentially multiple places to check.

The alternative is to not work with the data "somewhere else." Keep all the "working" parts inside the object, and to be sure, just don't publish the data.

Topics:
java ,layered architecture ,dci architecture ,clean architecture ,hexagonal architecture ,object orientation ,performance

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}