Hello everyone, today we'll conduct a short hypothetical interview with the SMACK stack about its architecture and uses. Let’s start with of some introductions.
Interviewer: How would you describe yourself ?
SMACK: I am SMACK (Spark, Mesos, Akka, Cassandra and Kafka), and I belong to all open source technologies. Mesosphere and Cisco collaborated and bundled these technologies together to create a product called Infinity, which is used to solved pipeline data challenges where the speed of response is what matters, like with a fraud detection system.
Interviewer: Why SMACK?
SMACK: Nowadays, the modern data-processing challenges are:
- Data is getting bigger, or more accurately, the number of data sources is increasing.
- Today, many modern businesses model data from one hour ago, but that is practically obsolete.
- Data analysis becomes too slow to get any return on investment information.
- One modern requirement is to have horizontal scaling with low cost.
- We live in a age where data freshness matters many times more than the amount or size of data.
There are many challenges we are facing, and SMACK exists because one technology doesn’t make an architecture. SMACK is a pipelined architecture model for data processing.
Interviewer: The Lambda Architecture is a data processing architecture and has the advantage of using both batch and stream processing methods, so how is SMACK different?
SMACK: Yes, the Lambda Architecture has these features, but most of lambda's solutions cannot meet two needs at the same time:
- The ability to handle a massive data stream in real time.
- The ability to handle multiple different data models from multiple data sources.
For these, Apache Spark is responsible for real time analysis for both historical and recent data, From massive torrents of information, all such information and analysis results are persisted in Apache Cassandra. So, in the case of failure we can recover the real time data from any point in time. With the Lambda Architecture it’s not always possible.
Interviewer: SMACK, can you briefly describe about your technologies?
SMACK: Yes sure, as we discussed SMACK is basically used for Pipeline data architectures for online data stream processing. There are lots of books and articles are available on each and every technology but we are using every technology for some specific purpose like:
- Apache Spark: Processing Engine.
- Akka: The Model.
- Apache Kafka: The Broker.
- Apache Cassandra: The Storage.
- Apache Mesos: The Container.
See, all are Apache projects with the exception of Akka.
Interviewer: Is SMACK the only solution?
SMACK: No, You can replace individual components per your requirements, like Yarn could be used as the cluster scheduler instead of Mesos, and Apache Flink would be suitable batch and stream processing alternatives to Akka. There are many alternatives to SMACK.
Interviewer: Can I take a picture of you?
SMACK: Yes sure, cheeezzzz...