Webscale Means Time-Based Architectures
In an interesting piece yesterday, GigaOM reported that Netflix has an architecture built around timelines. This struck home for a guy who spends a great deal of time talking to people skeptical about the need for zero latency, real-time systems. The truth is that some things have to go at the highest speeds and others don’t. The problem is in creating systems that use resources wisely to get it right in each case.
If it was only about the hype, you’d think Hadoop is the answer to processing web data, but it isn’t that simple. Hadoop is still (despite even newer hype) a batch processing concept that allows data to “get stale and applications probably don’t include the newest user input.” Before that sounds too negative, there is data that can afford to be stale without a business penalty and can be processed and moved in an architecture that is offline and more traditional.
For real-time, Netflix needs to use the absolute latest inputs and has a solution:
Netflix uses online processing for receiving information from users in real time and serving up responses right away, such as looking at a new rating or some other customer action to change the set of movies shown to the customer. Real-time processing works best when algorithms are relatively simple and when data is on the smaller side. The data feeding in to computations must also be available right away.
This takes very high levels of integration and applications reading in-memory data at lightning speed. This means a very particular, state-of-the-art architecture that Gartner’s Massimo Pezzini calls, “The Next Generation Architecture: In-memory computing.“
Between batch and real-time exists a middle ground that Netflix calls ‘nearline’. This is also a reality for business today and involves NoSQL and SQL databases that are more complex and less time sensitive than real-time needs.
The Netflix story is available on their blog and is a great example of how companies that are at the front of the ‘time wars’ are managing their requirements in time-based ways rather than monolithic, expensive and risky models. In a world of many shades of gray, this just makes sense.