Scaling Is Not Just About Products – It’s About Teams, Too
This article reviews some of the best practices for scaling teams that operate hyper-scale systems powering popular online products.
Join the DZone community and get the full member experience.Join For Free
We are well aware of what is meant by system scalability. System scalability is about maintaining the SLA of the system as the user base continues to grow and as the user activity continues to rise. We can say that system scalability is about supporting a linear rate of growth in system resource requirements given that code running inside the system is unchanged. However, to build highly successful products, this is not the only type of scalability that we should worry about. Software engineering team scalability is equally important.
Software Engineering Team Scalability
As our user base increases, our systems reach hyper-scale (which is a good thing!). However, we still want to continue to be as agile as we were in the early days when the system was much easier to reason about and making code or configuration changes were easy. As we hire more developers to help churn out features and systemic improvements to meet the surging user base, we start noticing that we need to scale our development processes and our codebase such that we can support a similar rate of agility as before, but with a much larger team size. As the codebase and team grow, it gets harder to add new features and maintain the same level of agility. We want our teams to scale along with our user base.
As our team size grows, it gets harder to handle the growing technical debt or tech debt in the codebase. Tech debt can be simply defined as a concept where the same changes tend to take a longer time as our team size or codebase grows. This implies that we have been making some sub-optimal engineering decisions along the way or we have ended up with a team structure that is sub-optimal. Both are far too common, and we discuss a few best practices around avoiding them or at least keeping them in check.
Best Practices for Scaling Engineering Teams
- Setup teams to be collaborative but autonomous: Structure teams such that they are as autonomous as possible. Teams should be collaborative on designs to ensure loose coupling, extensibility, and maximize reuse but it is important to be able to execute independently for maximum development velocity. The mutually agreed upon designs should specify API contract(s) for the proposed code changes facilitating parallel implementation within different teams. Geo-distributed teams should be set up such that there are local engineering decision-makers that serve as the beacon of engineering excellence and ensure a high quality of delivery. In the microservices world, teams should ideally own a single microservice end-to-end and develop complete expertise in it. Their development, testing, and deployment processes must be independent of other teams.
- Setup development processes to serve as “Paved Roads”: Invest in a microservices development framework that is used by all developers to ensure standardization. This framework should offer standard ways to do logging, monitoring, dashboarding, caching, and retries. Invest in an experimentation platform that makes it easy for team members to set up, run and monitor online experiments. Support feature-toggles infrastructure through this platform. Both experiments and feature-toggles are good practices for scaling fast-moving teams and products.
- Scale testing methodologies to identify design gaps: Invest in developing standardized integration testing, unit testing, and performance testing frameworks. Make it easy for developers to identify available test cases; extend them or reuse them. Make it easy to run multiple end-to-end test scenarios in parallel on environments that are closely like production (aka simulation testing) via a platform. Simulation testing enables developers to test end-to-end user flows before deploying to production. Such frameworks and platforms should be integrated as much as possible in the Continuous Integration (CI) pipeline. Chaos testing is another useful modern methodology that helps identify gaps in the system’s design that result in poor reliability. It also helps identify hidden dependencies that become a substantial problem as the system grows into hundreds of microservices.
- Treat tech debt like debt owed to the user: The biggest hurdle to lowering tech debt is the prioritization of that work. It is important to tie some aspects of developer’s performance appraisals to engineering improvements to be able to reward efforts that reduced tech debt. Encourage developers to leave things better than they found them. Tech debt is a collective effort and since it slows down the development of features and bug fixes alike, we owe it as a debt to our users and the business.
- Be careful to not over-engineer your engineering teams: Over-engineering of teams can happen when there is an overreaction to perceived bottlenecks in team velocity leading to over-complicated processes, redundant decision-makers, or sub-optimal team structures. This can lead to further slowing down of team velocity and demotivated engineers. As changes are gradually introduced to improve team scalability, it is important to analyze the effects of each change and then act accordingly to avoid over-engineering of teams.
It is important to investigate the above-mentioned best practices and identify what works best for the team and the product. One of the biggest hurdles to improving team scalability is the adoption of a best practice as change is hard, and also there is a fear of the unknown. Teams are so used to operating in one way that changing their routine or adopting a different way of doing things is frowned upon. Teams must start with first identifying the bottlenecks that are slowing down their velocity. While identifying these bottlenecks, it is essential to not just be focused on the system but also on the team structure and processes. The poor user experience caused by these bottlenecks must be used as the driving reason to prioritize the broader adoption of these best practices. Once these bottlenecks have been identified and tied to user impact, it becomes easier to align and prioritize these changes as they resonate with the strategic good of the product and business.
It must be said that not a single best practice, but rather a combination of the best practices discussed in this article will best support engineering team scalability. People-aware leaders place a great deal of weight on scaling teams from the perspective of maintaining team motivation and general wellness. Most successful hyper-scale online products are built by teams of non-trivial size implying that investment in scaling teams along with the product is a clear recipe for success. Investing in team scalability also contributes to better product and service quality. Operational excellence is a combined outcome of well-structured teams, processes, and well-designed systems. This is not possible if anyone out of the three is missing.
Opinions expressed by DZone contributors are their own.