Recently at Surge 2011, the annual conference on scalability and performance, Google's CIO Ben Fried gave an illuminating keynote address. His main insight was that generalists are the people that will lead engineering teams in successfully scaling the web.
In a world where the badge of Specialist or Expert is prized, this was refreshing perspective from an industry bigwig. As tech professionals, or any professional for that matter, we don't welcome the label of generalist. The word suggests a jack-of-all-trades and master of none. But the generalist is no less an expert than the specialist. Generalists can get their hands greasy with the tools to fix bugs in the machine but they are especially good at mobilizing the machine itself; with their talents of broad vision, and perspective they can direct an entire team to accomplish tasks efficiently. This ability to see big-picture can not be underestimated especially during times of crisis or pressure to meet targets. For a team to scale the web effectively, you're going to need a good mix of both types of personalities.
Picking out the potential generalist
Startups wanting to achieve scalability are face with huge pressure to do more with limited budgets. In bringing on new engineers, they must hire people who have the programming skills to realise their big idea. Ideally these programmers should also have some architectural vision, a knowledge of web operations, and performance as that application becomes popular. And what of maintaining that large infrastructure as it grows?
So the question for a startup is how do you spot or hire generalists? In the book, REWORK by Jason Fried and David Heinemeier Hansson, the authors emphasize good writers and good teachers. Their point is that in order to teach an idea or concept you have to understand it thoroughly and be able to step into someone elses shoes in order to explain it from their vantage point.
This is in large part the skill that Ben Fried was speaking about at Surge. To borrow his method of using "Disaster Porn" as a way to illustrate a point, we have a story of our own.
Our own disaster porn
About five years ago we worked for a firm who was faced with ongoing challenges of growth. Their user base was growing by 25%-50% per quarter but they were suffering from outages because of that growth. What's more one of their top engineers was leaving to join another company. They took the opportunity to bring us on board to assess the entire infrastructure.
We looked over the architecture and were surprised at every turn. Although they had a lot of engineers on staff, they were all tasked with building features, and responding to ongoing business requirements. None were given any operations responsibilities. There was a very obvious lack of leadership. so you can imagine how this turned out to be a recipe for a fine mess. One day we'd see new servers being added at random, another day we'd witness haphazard decisions with what technologies to use or what what versions of frameworks to adopt. In effect, each engineer was making decisions without considering the consequences on the whole.
The infrastructure wound up being built on two different webserver platforms, three - count 'em - three different programming languages and frameworks, and three MySQL databases scattered about on different machines. After a few hours discussing the architecture with the team, we put together a plan that framed the architecture around three simpler tiers. Two included the standard load balanced webserver tier, and backend database tier, and then a third to manage batch jobs and building static assets and media files.
A generalist solution
Our push then was to standardize on one type of webserver, one version of each language stack, and consolidate all the databases into one instance. This huge simplification meant that they could add replication to the database tier, eliminating single points of failure, providing redundancy for all business services. This in itself was a major achievement. We left them with some major problems solved while offering a new direction and a better handle on the remaining challenges. What the company had lacked was not engineering know-how, but rather a generalist's perspective. The engineers had focused too much on immediate tasks, locked on detail, but lost sight of the big picture.
As more companies move their applications to the cloud, some carefully and some not, we anticipate many more disaster scenarios such as these. This speaks strongly to the rising cult of DevOps and its effort towards broader skills and collaboration among both developers and operations teams. The good thing to come out of it is that cleaning up messes such as these will force us to hone our strategic thinking and organizational skills, possibly making generalists out of many more of us.