If it were still 2012, I would have eagerly been a part of any conversation about big data. It was a big buzzword, and you had to be speaking the "magic" words to get people to listen to the latest and greatest in technology. But fortunately/unfortunately, it is 2017 now, and it is disappointing to note that most of the world has not moved beyond big data. And believe me, it is not just the CIOs/CDOs who have been sitting in the ivory tower who are stuck with big data. It is also the energetic developers who are being scouted by talent firms looking for "big data" on their resume.
We at Knoldus build a holistic software development capability for anyone who joins us as an intern. It does not matter if you have been working in the industry for two years or for 10. When you undergo the internship, we give you a holistic software development immersion, starting with code quality, code conventions, principles, practices, and patterns of software development, and further leading to reactive platforms and the ecosystem tailing into the stack that we embrace, which is the Scala ecosystem and the fast data platform.
The catalyst for this post is a conversation I had with a top talent who joined us three months back. He was sad because he was not working on big data. When asked what he meant by "big data," the quick answer was Hadoop and Spark. When countered by the fact that he was learning Lagom and event sourcing, which would allow him to build better solutions, he was not too convinced.
There's nothing wrong with these technologies and in fact, they are what has made the ecosystem popular. But these technologies are only a part — sometimes a very small part — of a product with any business value. They solve a particular piece of the puzzle. And more often than not, if you base your product “just” on these technologies, you are bound to fail!
So where should we be headed if we are not talking about big data? The answer is to talk about fast data. Big data is a misnomer used in all kinds of scenarios. If you talk to 10 CIOs, 9 will say that they struggle with big data. It is of no consequence whether one manages 1TB of data and the other is managing several hundred PBs of data. We need to focus on making sure that the customers get the best experience. Customer experience (CX) is going to be the king of modern day applications. Only focusing on Spark/Hadoop/Flink and thinking that you can do big data is a fallacy.
Let's see how these sets of so-called big data technologies fit into the grand scheme of things.
If you are going to build a product with user interaction, then you need a reactive front end to the product so that you can provide amazing customer experience.
When hundreds and thousands of user requests come in, the product has to handle them without degrading performance. It has to be resilient.
There are going to be transaction-based processes — like someone querying for something, adding an item, and viewing their trades for the day. These could be handled by different microservices. These would have their individual life cycles and should be able to scale independently.
If you would like your system to be extensible and plan for any future business operations that are unforeseen at the moment, you need event sourcing.
You would want to separate out writes and reads to your system to make sure that the read and write SLAs are met and that you are able to scale the read and write side separately.
If you need to store your transaction data in the database, you would need either a SQL or NoSQL database.
Now some of your functionalities would also need analysis of data and come back with analyzed data. Now depending on the SLAs, this is where you would need Big Data frameworks to jump in.
You would need to run some machine learning or deep learning algorithms for your product to stand out.
Of course, we are simplifying the scenario a lot. But hopefully, you get the idea. Just being dependent upon a Big Data framework or hiring consultants who know a bit about Hadoop/Spark is not going to fly. You need an entire gamut of technologies that you need to work on, such as:
- Reactive UI
- Microservices framework
- Asynchronous messaging system
- Big data framework (there, I said it!)
- Hosting strategy based on containers
- Monitoring and telemetry
- Machine learning and AI
And believe me, this is a partial list.
And overlaying all of this are the principles, patterns, and practices of effective software development. The main drivers of technology, based on the principles of the Reactive Manifesto, are:
To sum it up, here is one possible scheme of technologies that can fulfill the product vision:
As you can see, big data frameworks are only a part of what you want to do. They're more than a drop in the ocean, but they're still not big enough.
The next time when someone comes and talks about big data and using big data frameworks to build the product, talk to them about all the other ancillaries and take what they say with not just a grain but a big bag of salt.