Welcome back to our series. In part one, we talked about the recent rise of microservices and the containers that envelop them. We covered the logging problems that arise from that type of architecture as well as a possible solution — aggregation. Now that we’ve gone over the requirements, let’s look at some different aggregation patterns in service architectures.
Source-Side Aggregation Patterns
The first question is whether we should aggregate at the source of the data—on the service side. The answer is a matter of tradeoffs.
The big benefit of a service aggregation framework without source aggregation is simplicity. But the simplicity comes at a cost:
- Fixed aggregator (endpoint) address. If you change the address of your aggregator, you’ve got to reconfigure each individual collector.
- Many network connections. Remember when we said we need to be careful not to overload our network? This is how network overloads happen. Aggregating our data on the source side is much, much more network-efficient than aggregating it on the destination side — leading to fewer sockets and data streams for the network to support.
- High load in aggregator. Not only does source-side aggregation result in high network traffic, it can overload the CPU in the aggregator, resulting in data loss.
Now let’s look at the tradeoffs for source-side aggregation.
Aggregating on the source has one downside: It’s a bit more resource-intensive. It requires an extra container on each host. But this extra resource brings several benefits:
- Fewer connections. Fewer connections mean less network traffic.
- Lower aggregator load. Because this resource cost is spread out over your entire data infrastructure, you have far less chance of overloading any individual aggregator, resulting in less chance of data loss.
- Less configuration in containers. Since the aggregator address for each collector is “localhost,” configuration is drastically simplified. The destination address only needs to be specified in one node—the local aggregate container.
- Highly flexible configuration. This simplified configuration makes your data infrastructure highly “modular.” You can swap services in and out to your heart’s content.
Destination-Side Aggregation Patterns
Regardless of whether we aggregate on the Source side, we can also elect to have separate aggregators on the Destination side. Whether we should do this is, again, a matter of tradeoffs. Avoiding Destination Aggregation limits the number of nodes, resulting in a much simpler configuration.
But, just as on the Source side, avoiding aggregation on the Destination side comes with costs:
- A change on the Destination side affects the Source side. This is the same configuration problem we saw when we didn’t have aggregators on the Source side. If the Destination address changes, all the aggregators on the Source have to be reconfigured.
- Worse performance. Having no aggregators on the Destination side results in many concurrent connections and write requests being made to our Storage system. Depending on which one you use, this almost always results in a major performance impact. In fact, it’s the part of the system that most often breaks at scale, bringing even the most robust infrastructures to their knees.
Source and Destination Aggregation
The optimal configuration is to have aggregation on both the Source and the Destination side. Again, the tradeoff is that we end up with more nodes and a slightly more complicated configuration up front. But the benefits are clear:
- Destination side changes don’t affect the Source side. This results in far less overall maintenance.
- Better performance. With separate aggregators on the Source side, we can fine-tune the aggregators and have fewer write requests on the Store, allowing us to use standard databases with fewer performance and scaling issues.
Another major benefit of Source side aggregation is fault tolerance. In the real world, servers sometimes go down. The constant, heavy load of processing the service log generated in a large system of microservices makes server crashes more likely. When this happens, events that occur during the downtime can be lost forever. If your system stays down long enough, even your source-side buffers (if you are using a logging platform with source-side buffers—more on that in a minute) will overflow and result in permanent data loss.
Destination side aggregation improves fault tolerance by adding redundancy. By providing a final layer between containers and databases, identical copies of your data can be sent to multiple aggregators, without overwhelming your database with concurrent connections.
Load balancing is another important data infrastructure consideration. There are a thousand ways to handle load balancing, but the important factor we’re concerned with here is the tradeoff between scaling up, i.e. using a single HTTP/TCP load balancer which handles scale with a huge queue and an army of workers, or scaling out, where load balancing is distributed across many client aggregator nodes, in round robin fashion, and scale is managed by simply adding more aggregators.
Which type of load balancing is best? Again, it depends. The approach you use should be determined by the size of your system, and whether it uses Destination-side aggregation.
Scaling up is slightly simpler than scaling out, at least in concept. Because of this, it can be appropriate for startups. But there limits to scaling up against which companies tend to smash at the worst possible time. Don’t you hate it when your service scales to 5 billion events per day and suddenly starts crashing every time it has to do garbage collection?
Scaling out is more complex, but offers (theoretically) infinite capacity. You can always add more aggregator nodes.
So, we've covered the logging problems microservices and containers can create and how aggregation patterns can help fix them. Keep an eye out for the series finale, where we go over Fluentd and the role it plays in easing the process.