Tumblr started on Rackspace in 2007, but quickly outgrew the space available through the IT hosting company. They began with an open-source solution stack, and primarily developed with PHP - for a while, nearly every engineer at Tumblr programmed in PHP. In the past, Tumblr's status as a startup kept them tied to a "squeeze everything out of a single server" approach, according to Matheny, but they have since moved on to bigger and better things.
Perhaps the most surprising change at the development level has been a conversion to a JVM-centric approach in order to increase efficiency of hiring and development. One aspect of this new JVM-centric approach has been the adoption of the Twitter library Finagle, a network stack that allows for the creation of asynchronous RPC clients and servers in any JVM-hosted language. According to Hoff, this choice was made over node.js because the Tumblr team believed node.js wasn't "developed enough to have standards and best practices."
At the same time, there has been a shift to non-relational data stores like HBase and Redis, although HBase has been used for "smaller less critical path projects" because the team claimed that it could not bank on HBase over MySQL. It sounds like Tumblr is adamant about the effectiveness of MySQL sharing, as they have not adopted MongoDB despite its popularity in New York (their location). Instead, Tumblr maintains that MySQL can "scale just fine." Regarding Redis, the team currently has 22 Redis servers, with hundreds of Redis instances being used in production.
For a startup that began just five years ago, Tumblr has had to deal with some big changes in their development philosophy. At the outset, says Matherny, developers were encouraged to "use any tool that they wanted," but over time and with growth they realized that this just wouldn't work. Thus, Tumblr has since standardized on a stack in order to address production issues, and implemented a lightweight, Scrum-like process. The long road to change at Tumblr has left Matherny with some lessons learned that may be applicable to other companies meeting similar challenges. Here are some of those lessons, as recorded by Hoff:
- Automation everywhere.
- MySQL (plus sharding) scales, apps don't.
- Redis is amazing.
- Scala apps perform fantastically.
- Scrap projects when you aren’t sure if they will work.
- Build around the skills of your team.
- Read papers and blog posts. Key design ideas like the cell architecture and selective materialization were taken from elsewhere.
- Wade, don’t jump into technologies. They took pains to learn HBase and Redis before putting them into production by using them in pilot projects or in roles where the damage would be limited.
You can read more details of Tumblr's evolution at the High Scalability blog.