There's so much great content from the Amazon re:Invent conference that it's almost impossible to get through it all. I'm going to share some of my favorites over the next few weeks. Here is a presentation on Yelp's highly fault-tolerant system.
Here's the session description:
"Efficiently parallelizing mutually exclusively tasks can be a challenging problem when done at scale. Yelp's recent in-house product, Seagull, demonstrates how an intelligent scheduling system can use several open-source products to provide a highly scalable and fault-tolerant distributed system. Learn how Yelp built Seagull with a variety of Amazon Web Services to concurrently execute thousands of tasks that can greatly improve performance. Seagull combines open-source software like ElasticSearch, Mesos, Docker, and Jenkins with Amazon Web Services (AWS) to parallelize Yelp's testing suite.
Our current use case of Seagull involves distributively running Yelp's test suite that has over 55,000 test cases. Using our smart scheduling, we can run one of our largest test suites to process 42 hours of serial work in less than 10 minutes using 200 r3.8xlarge instances from Amazon Elastic Compute Cloud (Amazon EC2). Seagull consumes and produces data at very high rates. On a typical day, Seagull writes 60 GBs of data and consumes 20 TBs of data. Although we are currently using Seagull to parallelize test execution, it can efficiently parallelize other types of independent tasks."