Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Schemas, Contracts, and Compatibility (Part 1)

DZone 's Guide to

Schemas, Contracts, and Compatibility (Part 1)

A discussion of the difficulties inherent in microservices communication and where schemas and APIs fit into the picture.

· Microservices Zone ·
Free Resource

When you build microservice architectures, one of the concerns you need to address is that of communication between the microservices. At first, you may think to use REST APIs — most programming languages have frameworks that make it very easy to implement REST APIs, so this is a common first choice. REST APIs define the HTTP methods that are used and the request and response payloads that are expected.

An example can be the backend architecture for an insurance product. There is a UI that lets users update their profiles and a backend profile service responsible for updating the profile database. If the user changed their residential address, the profile service will also call the quote service so that the insurance coverage can be recalculated (after all, some neighborhoods are safer than others).

Microservice Communication

With REST APIs, we can imagine the profile service using HTTP POST to send the following information to the quote service:

{
    user_id: 53, 
    timestamp: 1497842472, 
    address: "2 Elm St. Chattanooga, TN"
}

And the quote service will acknowledge the request with:

{
    user_id: 53, 
    status: "re-calculating coverage costs"
}

If all goes well, the user will receive an updated quote by mail at their new residential address.

Now you may notice that there is a certain coupling and inversion of responsibility here. The profile service knows that if the address changed, it is responsible for calling the insurance service and asking it to recalculate the quote. However, this bit of business logic really isn't very relevant to the job of a service that provides simple access to the profile database, and it really shouldn't be their responsibility.

Imagine that next week I want to roll out a new risk evaluation service that depends on the user address — do I really want to ask the profile service to add some logic to call the new risk service as well? What we really want is for the profile service to record any change in profile, and for the quote service to examine those changes and recalculate when required.

This leads us to event streaming microservices patterns. The profile service will publish the changes in profiles, including address changes to an Apache Kafka ® topic, and the quote service will subscribe to the updates from the profile changes topic, calculate a new quote if needed and publish the new quota to a Kafka topic so other services can subscribe to the updated quote event.

Microservice Communication

What will the profile service publish? In this case, a simple solution will be to publish the exact same information that it used to send in the HTTP request:

{
    user_id: 53, 
    timestamp: 1497842472, 
    address: "2 Elm St. Chattanooga, TN"
}

Now that the profile change event is published, it can be received by the quote service. Not only that, but it could also be received by a risk evaluation service, by Kafka Connect that will write the update to the profile database and perhaps by a real-time event streaming application that updates a dashboard showing the number of customers in each sales region. If you evaluate architectures by how easy they are to extend, then this architecture gets an A+.

Real-world architectures involve more than just microservices. There are databases, document stores, data files, NoSQL, and ETL processes involved. Having well-defined schemas that are documented, validated, and managed across the entire architecture will help integrate data and microservices — a notoriously challenging problem that we discussed at some length in the past. Using events and schemas allow us to integrate the commands and queries that are passed between services and the organization's data infrastructure-because events are both commands and data, rolled into one.

Note that the same definitions of fields and types that once defined the REST API are now part of the event schema. In both architectures, the profile service sends two fields: user_id, which is an integer, and the new address, a string. The roles are reversed though, the REST API is a promise of what the quote service will accept, and the event schema is a promise of what type of events the profile service will publish. Either way, these promises, whether an API or a schema, is what allows us to connect microservices to each other and use them to build larger applications. Schemas and APIs are contracts between services.Microservice Communication

It Is Not Just About Services

In fact, schemas are more than just a contract between two event streaming microservices. They are contract between teams. They are at the intersection of the way we develop software, the way we manage data, metadata, and the interactions between teams.

To illustrate, let's look at two examples:

  1. Documented contracts: Pretend that you are a data scientist working for our imaginary insurance company. You heard through the grapevine that the profile service is now publishing all user profile changes to a Kafka topic, and that anyone with a good business reason can be authorized to read those changes. This is great news! You always suspected that customers who move to even-numbered addresses tend to get involved in car accidents. Now, you can easily test your assumptions. To explore the data a bit, use Confluent's simple command line tool to read a few events from the Kafka topic in order to see what you are dealing with.Unfortunately, a single event looks like this:
    2001 2001 Citrus Heights-Sunrise Blvd Citrus_Hghts 60670001 3400293 34 SAC   Sacramento SV Sacramento Valley SAC Sacramento County APCD SMA8 Sacramento Metropolitan Area CA 6920 Sacramento 28  6920 13588 7400 Sunrise Blvd   95610 38 41 56 38.6988889 121 16 15.98999977 -121.271111 10 4284781 650345 52
    You can make sense of some of this, but figuring out the house number looks like a serious challenge. In fact, the only way to do it is to find an engineer from the team that built the profile service and ask her. If she says, "Oh, field number 7 is the house number. Couldn't you tell?" you'll need to bake this information right into your accident-prediction service. So much for decoupling, both in terms of interaction and reliance on other teams, and the code that is written.
  2. Prevent breaking changes: One of the benefits of event streaming patterns is that once the events are published and the schema is documented, it is very easy for anyone in the organization to use the data. This is a force multiplier for the organization-making it easier for the business to deliver features faster. But there can be unfortunate side effects. Imagine that one of the developers on the team developing the profile service decided that using milliseconds since 1970 to measure time was making their life too difficult. Consequently, the developer decided to switch to a formatted string instead. So instead of sending:
    {
        user_id: 53, 
        timestamp: 1497842472, 
        address: "2 Elm St. Chattanooga, TN"
    }
    They are now sending:
    {
        user_id: 53, 
        timestamp: "June 28, 2017 4:00pm", 
        address: "2 Elm St. Chattanooga, TN"
    }
    This will break most of the downstream services since they expect a numerical type but now get a string. It is impossible to know how much damage will be caused in advance since any service could consume events from the profiles topic.

Now in both those examples, it is tempting to blame "miscommunication" or a "lack of proper process." This is better than blaming individual engineers, but trying to fix the perceived lack of communication and process will result in various meetings, as well as attempts to coordinate changes with many teams and applications, slowing down the development process. The intent of the whole microservices exercise was to speed things up-to decouple teams and allow them to move fast independently. Can we achieve this?

Schemas Are APIs

The key to solving a problem is to frame it correctly. In this case, we framed the problems as "people problems." But if we look at the specifics of the issues, we can see that they would not occur if we used REST APIs instead of events with schemas. Why? Because APIs are seen as first-class citizens within microservice architectures, and any developer worth their salt will document public APIs as a matter of routine. If they didn't, they would be caught during code review. APIs also have unit tests and integration tests, which would break if a developer changed the field type in an API, so everyone would know far in advance that the change would break things.

The problem is that while schema and APIs are indeed both contracts, schemas often feel like an afterthought compared to APIs, which responsible engineers think deeply about designing, documenting, testing, and maintaining. Schemas are often used without the frameworks and tools that allow developers to specify them, evolve them, and version them, as well as detect breaking changes between schema versions early in the development cycle.

Forgetting about schemas is even more damaging when there are actual databases involved. Databases usually already have schema, whether relational or semi-structured, but there are cases where shortcuts are used to move data between different datastores and services. As a result, the schema information is lost-leading to the same negative outcomes seen when moving from REST APIs to events without well-defined schemas.

Schema management done well makes it is easy for engineers to find the data they need and to use it safely — without tight coupling between services or teams. Development teams benefit by creating new features and delivering new functionality to the business with very little overhead. They do this by subscribing to existing Kafka topics, receiving well-understood events, processing or using them to make decisions, and publishing the results to existing topics where they will be picked up by other services. For example, you can deploy a new dynamic pricing service by simply reading events that represent current demand, using them to calculate new prices, and publishing the new prices to an existing topic, where they will be picked up by a mobile application.

Things don't always align that neatly and sometimes the topics you need don't exist yet, but by having a framework to specify, document, evolve, and validate schemas, you can at the very least prevent services from accidentally breaking each other.

That's all for Part 1! In Part 2, we'll cover schema evolution and compatibility, enabling efficiently structured events, and more.

Topics:
microservices ,microservices communication ,rest api ,data schema

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}