DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

SBOMs are essential to circumventing software supply chain attacks, and they provide visibility into various software components.

Related

  • Micro Frontends Architecture
  • Unveiling the Application Modernization Roadmap: A Swift and Secure Journey to the Cloud
  • 11 Observability Tools You Should Know
  • Microsoft Azure Service Fabric

Trending

  • Migrating SQL Failover Clusters Without Downtime: A Practical Guide
  • Run Scalable Python Workloads With Modal
  • Multiple Stakeholder Management in Software Engineering
  • Advanced Insight Generation: Revolutionizing Data Ingestion for AI-Powered Search
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. How to Improve Software Architecture in a Cloud Environment

How to Improve Software Architecture in a Cloud Environment

This article explains how we can improve the existing cloud, microservices architecture on a project to make it more robust and powerful for all of today’s challenges.

By 
Milan Karajovic user avatar
Milan Karajovic
·
Jun. 06, 25 · Tutorial
Likes (4)
Comment
Save
Tweet
Share
2.0K Views

Join the DZone community and get the full member experience.

Join For Free

The need for software architecture today has grown more critical due to the increasing complexity, scale, and expectations of modern software systems. Applications today aren't simple. They involve multiple layers: frontend, backend, databases, integrations, microservices, and sometimes even AI/ML components. A strong architecture provides a roadmap for organizing this complexity into manageable pieces, making it easier to develop, maintain, and scale.

This article explains how we can improve the existing architecture on a project to make it more robust and powerful for all of today’s challenges.

Introduction

When creating an architecture for any application, the first step is to understand the requirements, formal description of what we need to build. Types of requirements are:

  • Functional requirements – Features of the system
  • Non-Functional requirements – Quality Attributes
  • System Constrains – Limitations and boundaries

For each of these requirements, there is a detailed procedure for developing the architecture, on which I will not focus now.

Also, regarding to the requirements of the app which you create, you can use one of the software architecture patterns:

  • Two – Tier Architecture (Mobile app, Desktop App)
  • Multi - Tier Architecture (Monolithic app with API Gateway or without API Gateway)
  • Microservices Architecture
  • Event – Driven Architecture

Existing Application

Now, we can focus on an example of how we can improve the software architecture of an existing application, which is the goal of this post.

Let’s assume we have an existing app similar to Instagram, Tripadwisor or TikTok.

  • A user can sign up and login to post, vote, or comment
    • Title
    • Topic tags
    • Body (text and/or upload images and/or upload files)
  • A user should be able to comment on any existing post
  • Comments are ordered chronologically as a flat list
  • Logged-in user can up-vote/down-vote an existing post/comment.
  • Present the top most popular posts in the last 24 hours on the homepage.

Current implementation of the app looks like it is presented on the next architecture diagram: 

An architecture diagram


  • Web App Service – A central service that provides web pages that cover part of the content presentation application
  • Users Service – Used for creating and login user
  • Post & Comments Service – Used for creation and storing posts and comment
  • Ranking Service - Efficient ranking of posts by popularity in the last 24 hours
  • Votes Service – Service which stores voting

We will consider non-functional requirements that relate to the quality of our application in a real-world application usage environment. These are Non-Functional Requirements for this application: 

  • Scalability (app should support millions of daily users)
  • Performance (it will be expected to be less than 500ms Response time at 99%)
  • Fault Tolerance/ High Availability (99,9%)
  • Availability - Partition Tolerance (We gave priority for “Availability - Partition Tolerance” over “Consistency - Partition Tolerance”; It is not possible to have Availability and Consistency at the same time)
  • Durability

Question is: “How to improve current architecture and cover required Non-Functional Requirements?!” 

It can be easy, if we go step by step and solve all requirements one by one. 

Improvements

Scalability

How can we improve scalability? We can place a Load Balancer  in front of the web app service and each one of the other services. This will help us scale out our system, depending on the traffic to our system and each service independently.

An image showing load balancer in the web app service


Then we can place an API Gateway to decouple the frontend code from the system’s internal structure. This will allow us to make changes internally and increase our organizational scalability. 

An image showing API getaway in a web app service


On the posts and comments database side, as we store more posts and comments, our database instance may run out of space. It also may not be able to handle the traffic from so many users.

So the solution will be sharding our posts and comments collection across different database shards. Database sharding is the process of storing a large database across multiple machines. A single machine, or database server, can store and process only a limited amount of data. Database sharding overcomes this limitation by splitting data into smaller chunks, called shards, and storing them across several database servers. All database servers usually have the same underlying technologies, and they work together to store and process large volumes of data. This way we can scale to as many posts or comments as we want to, by adding more instances and distributing our data across all those instances.

We will use a range sharding strategy which many NoSQL databases support. Range-based sharding involves dividing data into contiguous ranges determined by the shard key values. In this model, documents with "close" shard key values are likely to be in the same chunk or shard. This allows for efficient queries where reads target documents within a contiguous range.

The best solution for range sharding strategy in our case is to use sharding on a compound index of a post ID and the comment ID.

An image showing the use of sharding on a compound index of a post ID and the comment ID


In the worst case, the same post will be located in the two shards, like it is shown in the picture below:


An image showing the use of sharding on a compound index of a post ID and the comment ID


Performance

One potential performance bottleneck is image blob store. Every time a user loads a post of a news feed with the most popular posts, we need to load images from our system. So, loading those images on the browser can definitely add a lot of latency.

To solve this problem, we can utilize a global network of CDN servers. A content delivery network (CDN) is a geographically distributed group of servers that caches content close to end users. A CDN allows for the quick transfer of assets needed for loading Internet content, including HTML pages, JavaScript files, stylesheets, images, and videos.

By using a pull model with long TTL (Time to live - refers to the amount of time or “hops” that a packet is set to exist inside a network before being discarded by a router.), images of popular posts will be distributed among those seed and edge servers, which are located very close to users in different geographical locations. A few popular CDN solutions are Cloudflare CDN, AWS Cloudfront, GCP Cloud CDN, Azure CDN and Oracle CDN. This reduces the traffic load on our system and also reduces the page load time for the users significantly.

On the API gateway level, we can introduce cashing at peak times when a small group of posts is viewed by many users simultaneously, we can store those posts in a cache and update this cache every few minutes. We can store those posts in a cache and update this cache every few minutes. This will also reduce the response the response time to our users and goes hand in hand with our decision to prioritize availability over consistency.


For example, also we can add an Index on the post ID which will make searching for a particular post in a constant time instead of scanning through all our collection of posts. 

Post and comments service posts collection


Similarly, we can use the same compound index we created for sharding the comments using the post ID and comments ID. This way, requesting a batch of comments for a given post will also be much faster. Of course, we can also use indexing for the other databases. 

Post and comments service comments collection


When a user comes to votes, we also have a performance problem here. As we recall, the votes are stored in the vote service database as individual records for each vote. However, when we show a post or a comment in the UI, we want to show the overall total number of up votes and down votes for each entity. So what we can do is introduce a Message Broker between the voting service and the post and comment. Additionally, we can add two fields in the post in comments collections. One for total up votes and one for total down votes. So any time a user up votes or down votes opposed, the voting service will add that entry to its own database and publish it as an event to the post and common service. Then the Post and Common Service can consume this event and update the total numbers of up votes for a given post which will store with the post record. This approach really helps if we have a short but sudden traffic spike when many people start voting on the same post or several post simultaneously, since we can buffer those votes in the message broker and we can make those updates slowly and complete them after that traffic spike is over. Of course, during this transition, users will not get the most up to date values for the total number of up votes or down votes, but that’s okay because we prioritized availability over consistency when we gather our nonfunctional requirement. 

Introducing a Message Broker between the voting service and the post and comment.


Finally, now that we have the total number of up votes and down votes stored together with each post and comment, we can load that data easily and efficiently to the user. 

Fault Tolerance/ High Availability (99.9%)

To ensure we don’t lose any data, we can introduce database replication if a database instance or database shard crashes. Similarly, every message broker data store and micro service is replicated, adding to our overall system’s full tolerance.


Introducing database replication if a database instance or database shard crashes


In addition, we can run our system across multiple data centers in different geographical locations and utilize a global load balancing service. This way, we can always fall over to another region if there is a natural disaster in one of those regions. Of course, as an added benefit, this also improves our performance as our system gets physically closer to our users on different continents.


multiple data centers in different geographical locations


The following image shows a situation where the closest server is unavailable to the user. In this situation, the user contacts the second closest server that is available.


image showing a situation where the closest server is unavailable to the user


Conclusion

Architecture is what helps keep the chaos in check. Software will keep getting more complex, but with a solid foundation, it doesn’t have to feel overwhelming. Taking the time to improve your architecture now can save a lot of headaches later and make life easier for everyone working on the project. It’s not about perfection. It’s about making smart choices that set you up for whatever comes next.

Software architecture Cloud application microservice

Published at DZone with permission of Milan Karajovic. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Micro Frontends Architecture
  • Unveiling the Application Modernization Roadmap: A Swift and Secure Journey to the Cloud
  • 11 Observability Tools You Should Know
  • Microsoft Azure Service Fabric

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends: