{{announcement.body}}
{{announcement.title}}

Easy Modern Architecture Patterns

DZone 's Guide to

Easy Modern Architecture Patterns

in this article take a look at some easy modern architecture patterns.

· DevOps Zone ·
Free Resource

Here, I'd like to tell you about the popular architecture practices, which allow you to create fast and reliable services. From all possible behaviors, I selected only the ways that you can easily use right now. Each item has libraries ready, or it could be solved via simple cloud technologies (public or private).

Horizontal Scalability

This is the simplest and the most popular item. There are two frequently used scalability options: Scale Out and Scale Up. The first option allows you to add new nodes to distribute load between them. To speed up the system with second one, you need faster servers, code optimization, etc.

I will use a cloud file storage as an example. We will try to apply our approaches to build analogs of OwnCloud, OneDrive, etc.

The default picture of each approach is shown below. However, it demonstrates the system complexity. We have to apply some approaches to synchronize our services. How would a system work if a user saves files on a PC and then wants to see it from mobile phone?

Vertical vs Horizontal Scalability

CQRS

Command Query Responsibility Segregation is an important pattern because it allows different clients to connect with different services and has the same events in streams. Its bonuses aren't so obvious for a simple application. However, it is quite important (and easy for use!) for high-performance computing.

The essence of this approach is simple: input and output data streams shouldn't be mixed. You couldn't send a request and wait for a response from the same endpoint. You must send a request to service A and receive the result from the different service B. Of course, these are different endpoints.

The first bonus of this approach is connection interruption transparency, which is quite useful for long-running queries. In addition, this is an important ability for mobile networks when TCP streams can be interrupted just because the user is in train (and the mobile network base station was switched).

For demonstration purposes, let's observe the standard client-service communication sequence:

  1. A client sends a request to the server
  2. The server starts a long-running task
  3. The server sends a response to the client

Let's imagine that there is a connection interruption at the item "2" (or these is network blinking). In this case, a server is unable to send a response to the user because the TCP stream is already closed! Let's apply the CQRS approach:

  1. A client subscribes to server updates
  2. A client sends a request to the server
  3. The server replies immediately "request received"
  4. The server processes the query
  5. The server sends a response with the result through the channel from the item is "1"

Classic mode vs CQRS mode

As you can see, this schema is a bit more complex. Moreover, you couldn't use an intuitive blocking approach "request-response". However, network interruption isn't followed by an error. If a user connects to the system from multiple devices at the same time, you can simply deliver updates to all of them.

In real life, the input messages processing code becomes quite similar (however not 100% unique) for replying events (e.g. actions which were initiated by used) as well as notifications (when a user receives information from another).

In addition, we receive one more bonus: if we have a single-direction stream, we can use a functional-style approach with libraries like Reactive Extensions. And this is serious advantage because you can simply make your application fully reactive. For big enterprise applications, this can save a lot of resources for support and development.

If we summarize this pattern with horizontal scalability schema, we can receive the next bonus: the ability to send requests to one server and receive answers from others! So the user application could select the most appropriate server (server node), which could dramatically improve UX. To simplify this, we have to use the next pattern.

Event Sourcing

As you know, one of the distributed system characteristics is "missing the time synchronization", e.g. you should never use the critical section approach between your servers. For the single process (or the single server), you can use mutex. And you should be absolutely sure that nobody else is inside this section at the same time. However, this is dangerous for a distributed system because it requires a lot of additional resources for synchronization. Moreover, this kills the distributed service advantages because all components wait when someone is inside the critical section.

Therefore, the distributed system couldn't be synchronized (of course if we want to have a fast one). From the other side, we usually need some consistency between services. To cover this, we can use Eventual Consistency. From Wikipedia: if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value.

It is important to understand that classical databases are frequently built with Strict Consistency, where each node has the same information (this can be achieved when the transaction will be committed only after agreement with at least N/2+1 writable nodes accepts it). Of course, there are some optimizations with different isolation levels, however, the idea remains the same: you can live in a fully consistent world.

However, let's return to our initial task. The major part of the system could be built with eventual consistency, so we can get the following:

Digram including queue processing, input queue, and subscription

The most important items of this approach are:

  • Each input request is stored in the common Input Queue
  • During the event processing, service can put tasks into the other queues
  • Each input event has identity (which is needed for deduplication)
  • Queue works in "append only" approach, e.g. elements couldn't be removed or reordered
  • Queue works with FIFO guarantees. If we need any kind of parallel processing we have to create different queues

Remember that we observe a cloud file storage system. In this case, it will be like this:

A close up of a map

Description automatically generated

It is important that the diagram items don't mean "separate server." Even the system process can remain the same. The important point here is "ideologically different items are separated to simplify applying horizontal scaling in future."

The system will be like here for two users (different users have different colors):

A picture containing black, sitting, clock

Description automatically generated

This combination has at least the following bonuses:

  • Processing services are separated. Queues are divided too. I'd like to remember that this separation is logical only, and all of them could run on the same server. However, in the case of the load increasing, you can simply add more servers and apply physical separation too.
  • When we receive a user request, we don't need to wait for the processing to be complete. We should just respond "ok" and start processing when we can. In this case, our request-respond delay is smaller. And in addition to this, our system is stable for suddenly increased requests count because we don't need to process everything right now. The data processing services work smoothly. In other words: external services can accept as many parallel requests as we have queue capacity (close to unlimited if we use cloud), however, real processing services can be restricted not to process more than N items in parallel. This is quite an effective optimization to avoid expensive internal services overloading (for example, to decrease database load).
  • I added a deduplication service just for an example. It tries to combine the same files. If it processes 1% of requests much longer than the other, this is absolutely invisible for the user. We accept the request before it is fully processed, which improves a lot of system reliability.

However, you can see the disadvantages of such configuration:

  • We missed the strict consistency here. If you connect to the different services, you will be able to get a different state at the same time. Of course, after some period, the states will be the same, however, it is better to remember this.
  • We couldn't rollback events the same as you can do with database. Instead, we have to add a new compensating event, which changes the previous invalid state to the correct one. This is like git history: you couldn't rollback commit from published history (when people pulled your repository). You can only create revert commit, which returns the previous state of the repository. However, both of them will remain in git history.

As you can see, Event Sourcing has C-energy with CQRS. It isn't so simple to create the same system without dividing data into the different streams because you have to add some synchronization points, which can kill the performance benefits of parallel processing. When you apply both approaches at the same time, you must update the client code too. After sending the file to the server, you just receive "ok". It means "the request was added into the queue." So to be formal, it doesn't mean that file is available on the all other devices (for example, deduplication service could do index rebuilding). However, the client will receive the notification that the "file X was saved" after some period of time.

As a result:

  • The status count of file upload request is increased. Instead of classical "file is sending" + "file was saved", we receive one more, "file was added to queue." And only "file was saved" status means that other devices are able to download it.
  • We have multiple streams for data input and data output. It means that we have to create solutions to receive file processing results. Our client can be restarted during the file processing (which is good), however, it is better not to forget remembering to recheck the final result. As you can see, it is not so complex because this schema is straightforward (internal outbound queue). And that is really good; we are absolutely tolerant of system failures. We don't care about temporary network glitches or any other disasters. Our files could be uploaded when the client is underground with an unstable connection.

Sharding

As we observed above, systems with Event Sourcing don't have strict consistency. It means that we can use multiple storages without any complex synchronization between them. In our cloud file system can we can:

  • Split files by types. For example, pictures and videos could be re-coded to more efficient format
  • Split data by countries. There are laws that require such data splits, however, our architecture allows this automatically

A close up of a map

Description automatically generated

Unfortunately, if we want to move data from one storage to another, we couldn't do this via standard tools. We have to stop queue, do the migration, and resume processing. In, general, we couldn't move data "on the fly", however, if the event queue is fully stored (e.g. events aren't removed) and we have previous versions of storage (e.g. old snapshots), we can replay the remain events by the following way:

  • Each event has a unique identifier (ideally it is increased only). It means that we can add the latest processed identifier into the storage.
  • Duplicate the queue. This is needed to process each event for two storages. First is our original storage, second represents the last snapshot. The second queue is stopped for now.
  • Start the second queue (e.g. start reapplying events to the new storage).
  • When the second queue looks empty (e.g. when new storage contains the same data with old one), we can switch readers to new storage. 
  • After switching is done, we can stop the old queue and remove the old storage.

As you can see, our system doesn't have strict consistency. It has Eventual Consistency only. However, we have a guarantee that all events are processed in the same order (however, probably with different delays). By using this we can easily move data to another country without stopping the system.

To continue our example, with cloud file storage, such architecture gives us more bonuses:

  • We can move data closer to users dynamically. This could improve our service quality.
  • We can store part of data inside client companies. For example, enterprise customers frequently require storing data in the in-house data centers (to mitigate data leak risk). By using sharding pattern this is easy for us, just because of our architecture. And the task is simpler if the customer has compatible cloud with our (for example, Azure self hosted).
  • And what is the most important benefit - we are able not to do this. For the first time, it is simpler and cheaper for us just to have single storage for all accounts (this is much ease for start). So the key advantage of this system: in addition to scalability, it isn't overengineered to start. Just avoid writing code that can work with millions of parallel queues, etc. If you need this, you can do it in the future.

Static Content Hosting

This chapter looks absolutely obvious. However, it is important for high-loaded applications. Its idea is pretty simple: static content should be distributed from a separate source, which is optimized for this task. Please note, that standard Java application is slow for this, however nginx is faster and better. As a result, the most of streamed data will be delivered faster just because of splitting responsibility between technologies. In addition, static service can be simply wrapped with CDN to move data closer to your customers.

The most simple static content segregation case — move all your JavaScript and website images there. It is quite simple to do this: this data is fully ready before web site started working, so you can just prepare archive and upload it to the CDN service, which delivers data to your users.

However, in real life, we can use another static content delivery approach, which looks like lambda architecture. Let's return to our service with cloud file storage. All uploaded files should be available for downloading by users. The most simple solution is "create service, which validates the user's authorization and then works as a proxy to download content from storage and send to user". The major disadvantage of this approach: we distribute static content (file + revision couldn't be changed by design) with the same server, which has business logic. It means that we should to streaming in Java/.Net or any other solution, which is quite ineffective. Instead of this, we can do the following:

  • User requests file from server
  • The server responds with downloading the URL. It can be file_id + key, where the key is a digital signature, which allows us to download the file during the nearest 24 hours (just for example).
  • Nginx can be file downloading service, however:
    • It should cache content. We have already split business and streaming logics, so we can put nginx on the special server with a lot of space available. Of course, this improves performance
    • It should validate the signature (which has all information needed)
  • Optionally: streaming content processing. For example, if we compress all files in the service, we can do decompression at this streaming module. As a result, IO operations are done at the right please, in the service which is designed to do the IO operations. Java archiver requires more memory, however rewriting service with business logic to Rust/C++ could be quite ineffective. In our case, we split IO operations and business logic, therefore we can optimize IO without touching more complex services.

A close up of a mans face

Description automatically generated

This schema doesn't look like static content distribution because it can be removed from the server (or new content can be added). However, we modified our system to reuse high-performance characteristics of static content distribution systems. Moreover, we can extend this to the case when content isn't static, however, at any time, it can be presented as a sequence of immutable, non-deleteable blocks (of course they could be added).

Just one more example to describe the idea. If you have worked with Jenkins/TeamCity, you know that both solutions are written on Java. Both services have a central server with Java processes. It orchestrates builds and manages content, and both solutions have the same task: "distribute file/folder from server." As an example, both of them can distribute artifacts, source code, and provide access to logs. And all these tasks have increased IO load. So we have a server that is responsible for complex business logic processing, and in addition, it should be able to work as an IO proxy. What is really interesting is which task can be delegated to nginx with the same logic as we observed above.

Let's return to our system. For the current moment, we have the following design:

A close up of a map

Description automatically generated

As you can see, this system is more complex than just "web server and database." It isn't just a single process, which stores all files locally. It requires more support; it needs API versioning and managing this. And it is quite important not to forget to measure the system complexity after the system designing. However, if we want to have the ability to extend the system in the future (to cover more users, more different scenarios, etc.), you have to select such approaches. This architecture is ready for increased loads. It could be updated without any pauses, which are visible for the user (obviously some components will be stopped and some operations will take more time than usual).

Because of the current pandemic, a lot of web services are receiving an increased load. For example, one UK company paused online shopping for some time. Service had demonstrated low performance, and finally, the company decided not to increase the profit. So instead of delivery delays, instead of client advice to schedule goods for the future, the system just said, "go to competitors." And this is the price of low performance: you lose money at exactly the time when your profit could be higher. 

Conclusion

All these approaches were known before. Vkontakte and Facebook use Static Content Hosting for picture downloading. A lot of online games use a sharding approach to split games to the different regions (real-world regions to decrease ping and game locations, to split loading between services). Event sourcing is used at mail services. A lot of trading applications utilize CQRS logic to speed up data receiving. And of course, horizontal scaling is quite popular.

However, all these patterns are easy to use in modern applications. Cloud solutions simplify sharding and horizontal scaling. It is quite simple to request new instances at the different data centers around the world. CQRS is much easier to use just because of libraries like RX. There were only a few websites that could support this just 10 years ago. Event sourcing can simply be configured just by using Apache Kafka. 10 years ago, this was innovation (at least inside the company), however, now this is a normal situation. And it's the same with Static Content Hosting — new technologies simplified this approach, so you don't need an army of IT specialists to develop and support this.

As a result, the implementation of complex architecture patterns is much easier to use. So now is the best time to revisit these techniques. You might have declined one of these patterns 10 years ago because of potential over-engineering, but now this can be quite useful (of course, I think the application must be refactored before). And this can make your application faster, more scalable, and ready for today's challenges (such as user data segregation).

And the most important thing: please don't use these approaches for a simple application. I understand that they are really pretty and useful for CV mention, however, a website with just 100 users could be processed with simple monolith architecture (of course, it is better to divide it into the modules inside, however, you can keep the single process for the external observer). 

Topics:
Static Content Hosting ,cqrs ,devops ,event sourcing ,horizontal scalability ,sharding

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}