Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

We Rebuilt Our Backend Feed Service! Here's What I Learned

DZone's Guide to

We Rebuilt Our Backend Feed Service! Here's What I Learned

Refactoring an application can come with a lot of headaches — so learn from this article rather than learning the hard way!

· Web Dev Zone ·
Free Resource

Access over 20 APIs and mobile SDKs, up to 250k transactions free with no credit card required

I’m a Software Engineer in Kurio (a news aggregator in Indonesia). As an aggregator application, our main job is: collecting many of our publisher partner's websites (news or articles) then serve it to our users through our application.

How we serve this content is just like other news aggregators out there, we serve many feeds (list of content) to our users, such as feeds sorted by our top_stories logic, feeds sorted by what’s trending, or just a feed from a specific publisher or category.

Kurio Feed Layout in Mobile



All the process to build the feed per user is handled by one of our service: the feed service.

This service is one of our top three biggest projects at Kurio, and the previous version of the service has served our users for a very long time. Because of that, it became very complex and sometimes too hard to understand. It makes it very hard to add a new feature. And because of this, we decided to rebuild our new feed service. Hopefully, with this new version of the feed, we can easily add a new feature or just make it easier to maintain.

In this new project, we created our new architecture, a bit of a hybrid with our old architecture, but also a bit dynamic and more flexible. Because, as we know, the feed is a collection of any object type, from an article, video, audio, etc. And doing this in Golang is really challenging because, as we know, Golang is a statically typed programming language, and it doesn’t have the generic-types like in Java or other programming languages.

Understanding the Flow

It’s important to know how the previous system worked. Starting with compiling, testing, and deploying until receiving any request from the user, we need to know how the process works.

Because this service is the core service, and I’ve just been here a year, I don’t really understand how it works. Especially when the previous system had been layered with so many additional features and patches throughout the years, it’s hard for us to understand just by reading the code. So the solution is, we just need to know the flow and the rules, then we can build a new one based on that flow and set of rules.

For instance, when the user opens the application, they will get a top_stories feed which is served by this service. Or some rules like: there is a limit of content to be displayed to the user in the top_stories feed. Or something like: don’t show any content to the user from a topic they don’t follow, or show the feed by the user’s attributes (gender, age, etc.).

To list these rules and flow is a simple thing. The hard thing is to transform it into the code. In general, our flow is really simple, just like this below.

Flow when user fetching a feed

So basically it’s just two big functions, fetch-personalized-feed and fetch-default-feed. The hardest thing to do is the personalized-feed because we have to integrate it with our personalization engine; also, there a few rules that we must follow related to the personalized content I mentioned above (extracting user interest and attributes, then building the feed based on the user's interests).

Design and Debate First

Our main reason for rebuilding this service is that the previous system’s code-architecture did not scale well when we developed any new feature related to the newsfeed. It would have been a pain to develop new features in the future because we would have had to refactor many things.

So what we really needed was to fix the architecture. Designing a new architecture is really hard. We must think in abstract ways. There are so many questions like: “What if?”; “Why this?”; “Why not this?” We wanted to make our new architecture really fit our “maybe” future problems and also offer backward compatibility. Just to design it, we spent about a month full of debate and discussion. From the tech stack into the flow for each of the big functions.

The final decision we all agree is that we are trying a few functional ways. With abandoning our mostly code-architecture used here, we invented a new code-architecture with a bit of functional programming (using Higher Order Function patterns) but not so dynamic like the original functional programming language like Lisp or Clojure.

So, in our code, we will find many of HOF (Higher Order Functions) pattern like below:

func something(params, func(params)) (func(params)){
}

But, because we use Golang, a statically-typed programming language, there is so much pain when we create many functions like this. We have to perform a lot of type-checking/casting to avoid panic, which will consume a lot of time.

Knowing this, we realized Golang is not suitable to our problem, but with 10 of us working as backend engineers, and only one who understands how to use Clojure (functional programming), to learn a new programming language means we need extra time to finish. So after a long discussion, we agreed to go ahead and continue using Golang, as all of our backend engineers understand Golang well. Besides, Golang is already proven and running in many of our microservices.

Understanding the Basics

When transforming the flow and design into code, one thing I realized is that we must really understand the basics. In the beginning, I didn't really understand how the Higher Order Functions worked. It’s too confusing when reading the code. How could be a function receive a function then return a function? Thanks to Google, obviously, I now understand how it works.

We also need to understand the basics of Golang itself. Something like the use of the pointer as a function receiver (like I wrote about a few weeks ago in this article here) and the very basic things about DateTime standards (like I wrote about in this article here), and many other basic things that we must really understand. If we don’t understand this well, it just will increase the timeline to finish this project.

Run First, Optimize Later

The rules of optimization are well explained here.

  1. First Rule Of Optimization — Don’t.
  2. Second Rule Of Optimization — Don’t… yet.
  3. Profile Before Optimizing

So, when developing this service, our first target is to make sure that, at least, this service can run and work. We did not think about performance yet, especially when we’re just a beginner in HOF and Golang; we just tried to ignore anything about optimization like using Go routines or anything related to code optimization.

So then we finish it, we can compile and run it, and all the requests and responses work perfectly. Our initial application is very slow, obviously. Compared to the previous system, it was ten times slower. In the previous system, with the staging server, it took about 500ms for a single API request. Compared to our first working applications which took 50000ms (about 50 seconds) or sometimes longer.

Optimizing the code is also one of our biggest tasks. To optimize our code, we followed these steps:

  1. Search for all loops that perform a long process, and transform it to use Go routine to make it parallel or make use of pipelining (if it must use a pipelining pattern).
  2. Profile the system and detect all the slow functions and optimize them. Luckily, in Golang, profiling is very easy. Thanks to the pprof tool, we can profile our system and detect all the slow functions. Even for all libraries we used, we can detect which library is slow, so that we can change with another library that has a similar function.
  3. Adding Cache if needed.

One of our rules when building service is to use cache if you really need to use it. Cache is like a drug, it can make us addicted to using it because it can act like a silver bullet when our system looks very slow. Usually, the people who are addicted to using “cache” when developing a big concurrent project, the first thing who come to his mind is “cache” without thinking about optimizing (benchmark, profiling) their function (logic/algorithm) first.

For our case, we use cache in two ways:

  • Deduplication management: So, because the feed is a list of content (articles, news) that may come from many repositories (databases and services), there are possibilities that the content may duplicate. So we use the cache as temporary storage to handle the deduplication process.
  • Repository Cache: Because the feed is a list of content (articles, news) that come from many repositories (services and databases), there is a possibility that multiple users are looking for the same content. So, to avoid fetching the same thing from the repository, we cache the repository result.

With this optimization, we can at least improve the performance the way we did in the previous system (the response time was about 400 ms in the staging server and about 180ms in the production server).

Be Careful With Changes

Based on semantic versioning, rebuilding without a new feature and without breaking changes in the API, is not a new version. Basically, in this newly rebuilt system, our target only changes the architecture, not the API specification. So, whatever our changes di in the system, the API must not be changed. Because even very tiny changes affect all the related services.

But, just to make it be a new version, we made a few changes in the error response body, (actually, if we follow how sentimental versioning works, this is already a breaking change even if we don’t make any changes in the error response body, because we already made a huge change in the architecture, but we just try to stick with the semantic versioning, so we make a small changes in our error-response body )

So we change our error-response body from (original error-response body)L

{  "error": "Error Message"}

to (new-version of response body):

{ 
  "error": {    
  "message": "Error Message",    
  "errors": [      
    // any stack-trace errors      
  ]  
  }
}

With these new changes, we also need to take care of any related services that consume our API service. Luckily, it’s only two services that consume this service, so we are just updating two applications: the dashboard app and the mobile gateway API. Also, the breaking changes are only the response error, so it’s only a small part of the changes in every connected application.

Never Ignore the Tests

When rebuilding this service, we have at least three tests we must pass before releasing to production and real-world users: unit testing, integration testing, and load testing.

And of all these kinds of tests, unit testing is the smallest one. Some people seem to underestimate the importance of unit testing because it’s just a unit, a small function. But, when rebuilding this new feed service I learned how important unit tests are.

At the beginning of the Sprint, we ignored the unit test, because we wanted to focus on designing the code architecture. So we made a few functions that didn't have any tests. The reason for doing this is because we were still building some of the experimental code architecture, thus, just to avoid unnecessary refactoring in the unit test, we didn’t create any unit tests at this point.

But, after the code architecture seemed fixed, we forgot to add the unit test to our first function at the beginning of the Sprint. Until then, when we deployed it to the staging server and did the integration tests with another real service connected. We found many bugs in our application. Then we looked into the source code; there were so many conditions that were not covered by our function.

Knowing this issue, we realized this was happening because we hadn't tested the function yet. It hadn't been through any unit tests yet. If only we had been doing unit testing from the beginning, we wouldn’t have had extra work to do in order to fix this and re-deploy it. By doing the unit test first, we would be forced to think about many different cases first that may be handled by that function and we could fix them before our app deployed.

Conclusion

Even when our work is behind-the-scenes and doesn’t have a visual impact on our users, but we do learn many things. I learned a lot about how to build a system from scratch that has high concurrency calls. After finishing this task, I understand why, when we have an interview for a backend position, the test is always about logic and algorithms. It’s because performance matters when building a highly concurrent service. Any algorithm that's written is affecting all the response times.

#1 for location developers in quality, price and choice, switch to HERE.

Topics:
golang ,backend development ,rest api ,web dev ,web application development

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}