Thoughts on Building a Successful Streaming Analytics Platform
Stream analytics apps must get easier to cater to business needs. Learn about distributed streaming computation engines and more from Hortonworks data analytics experts.
Join the DZone community and get the full member experience.Join For Free
As part of the product management leadership team at Hortonworks, there is nothing more valuable than talking directly with customers and learning about their successes, challenges, and struggles in implementing their Big Data and analytics use cases with HDP and HDF. These conversations provide more insight than any analyst report, white paper, or market study.
In my 4+ years at Hortonworks, I have had many opportunities for face time with our more than 1,000 customers. These conversations have strongly influenced how we build enterprise software products that are easier to use.
There have been a handful of moments with customers that leave an indelible mark, reshaping how one thinks about a problem set. One of those moments occurred a few months ago with a customer who was using Apache NiFi as part of the Hortonworks DataFlow (HDF) platform to ingest, route/move, enrich, and transform data from edge devices like cable modems, voice over IP phones, and home security systems. HDF was transformative for this customer and they especially appreciated NiFi’s compelling user experience to greatly reduce the operational effort for data ingestion and flow management.
I posed the following question to the customer:
Where did you experience pain when implementing this use case? Where can we continue to innovate in HDF to ease those pains?
The response went something like this:
Using NiFi with its rich UI has been a refreshingly delightful experience for us as we build flow management applications. However, we desperately need the same type of experience when building streaming analytics apps. Flow management only gets us halfway there. We need a rich UI to build analytical apps that operate on the stream.
The above response has been echoed by almost every one of our customers, and it has strongly influenced the strategic direction, efforts, and investments in the Hortonworks data-in-motion platform: Hortonworks DataFlow (HDF). We have gleaned two insights from the customer’s response:
- Building end-to-end data-in-motion use cases requires both flow management and streaming analytics capabilities.
- Building streaming analytics must get easier.
Data-in-Motion Solutions Require Both Flow Management and Stream Analytics
What is the difference between flow management and streaming analytics?
- Flow management provides an easy, secure, and reliable way to get the data you need from anywhere (edge, cloud, data center) to any downstream system with intelligence (routing, transformation, filtering, bi-directional communication).
- Streaming analytics provides immediate and continuous insights using aggregations over windows, pattern matching, predictive and prescriptive analytics, and so on. Streaming analytics is part of a superset of capabilities provided by stream processing.
As the customer above noted, one needs both capabilities to be successful. This is the reason that HDF was expanded in the middle of 2016 to offer stream processing in the HDF 2.0 release with Apache Storm and Kafka. The below diagram summarizes this expansion.
Building Stream Analytics Apps Must Get Easier
Simply adding Apache Storm and Kafka to HDF does not address the second key point: building stream analytics quickly and easily. Customers often cite the following key challenges:
- Building stream analytics apps require specialized skill sets that most enterprise organizations do not have today.
- Stream analytics apps require a considerable amount of low-level programming, testing, and tuning to bring to production.
- It takes a lot of time to design, develop, test, and deploy into production.
- Key streaming basics such as joining/splitting streams, aggregations over windows of time, and pattern matching are difficult to implement.
- Customers do not want to code complex stream analytics apps.
- While traditional mature streaming vendors (IBM Streams, Tibco, SAS, SAP) solve challenges 1-5, they are cost prohibitive, proprietary, and do not provide scale-out architectures.
- No truly open source tool solves challenges 1-5 today.
How do we address these challenges? Over the last six months, the Hortonworks Stream Processing engineering and product management teams have been working on a brand new set of powerful components that address each of these challenges. The below sections outline some of the fundamental principles driving this initiative.
Next-Generation Streaming Analytics Solutions Need to Cater to 3 Different User Personas
There were two driving design principles that drove this effort. First, these new set of components should allow the user to design, develop, deploy and manage complex streaming analytics apps without them knowing the complexities of the underlying streaming engine. The developer should be able to build complex streaming analytics apps writing as little code as possible. Second, the toolsets need to cater to three important personas within the organization:
- App developers: Design, develop, and deploy streaming apps using a drag and drop visual paradigm.
- Operations team: Create abstractions to big data services for App Developers, and supply tooling to help operational users deploy, monitor, and manage streaming apps.
- Business analysts: Allows business analysts to immediately access the streaming data and perform descriptive analytics on the streams using a powerful exploration platform.
Published at DZone with permission of George Vetticaden, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.