Three Things To Know About HDF 2.0
This article covers the major release from Hortonworks including a new version of NiFi, Kafka, and Storm.
Join the DZone community and get the full member experience.
Join For Free
hortonworks dataflow (hdf) 2.0, offers a combination apache nifi 1.0, kafka 0.10 and storm 1.0. hdf 2.0 has significant architecture and enterprise productivity features to make it faster and easier to deploy, manage and analyze streaming data. in the next few weeks, we will go into more details, but for now, here are the three highlights to take note of.
1. integrated, enterprise ready ecosystem of apache nifi, kafka, storm with ambari, ranger, and zookeeper
hdf 2.0 offers an enterprise-ready, integrated deployment and management option for streaming analytics, from the edge into the core with:
- apache nifi for dynamic, configurable data pipelines, through which all sources, systems, and destinations communicate.
- apache kafka 0.10 for high throughput distributed messaging with pub sub semantics to operate at speed on big data volumes that adapt to differing rates of data creation and delivery
- apache storm 1.0 for real-time streaming analytics to create immediate insights at massive scale, with performance that is 6-10x faster than any previous storm release .
with the new enterprise readiness features of hdf 2.0, businesses can accelerate business value from data in motion through improved developer productivity, operational and architectural improvements
developer productivity improvements of hdf 2.0
- storm windowing and state management
- improved storm topology debugging including dynamic worker profiling, topology event inspector, dynamic log levels and distributed log search
- improved kafka sasl and kafka automated replica leader election
- improved storm scalability with pacemaker daemon, resource aware scheduling and improved nimbus ha
operational visibility improvements of hdf 2.0
- integrated and comprehensive platform level monitoring, management, governance and security for apache storm 1.0 and kafka 0.10
- integrated ambari views for storm for management and monitoring
- integrated ambari metrics server and grafana integration for both storm and kafka that provides improved metrics collection and sampling to get more accurate and granular metrics performance, as well as time series metrics visualization and configurable metrics dashboards
architectural improvements of hdf 2.0
many enterprises today deploy a combination of individual products for data movement, data collection, messaging bus and real-time streaming analytics to create an integrated in-house solution. hdf accelerates the on-ramp to streaming analytics with an integrated enterprise ready solution.
2. productivity gains with new visual user interface and multi-tenant authorization
apache nifi is a fairly mature project in the sense that it started almost exactly 10 years ago with roots in the nsa (happy 10th birthday apache nifi!) and noted by a tweet in june 2016 from domink benz “ nifi is an project in the hadoop space with a nice gui. and documentation.” *everybody laughs*
but now, to match a modern ui aesthetic and meet new enterprise productivity demands, the apache nifi visual user interface has been given both a facelift as well with new ui options to make is easier and faster for dataflow creation, management, tuning and control of real-time data. it also has new ui features to make it easier for deployment and operational scenarios, including the needs for multi-tenant authorization – the ability for multiple entities within an enterprise to securely manage different portions of the same dataflow.
this allows enterprise productivity gains unparalleled by any existing design and deploy options for data movement. each entity has fine-grained component level permission control in order to manage access, usage and modification of their dataflows, and yet, each can still view each other’s dataflows for full context and understanding of the data being transmitted and received. the equivalent of having multiple collaborators work on a single shared google doc, multi-tenancy in apache nifi gives enterprises a common infrastructure connecting disparate teams and data sets in real-time and provides secure transparency between one another’s projects.
3) support for apache minifi
hdf 2.0 supports apache minif i, a subproject of apache nifi, designed to solve the difficulties of managing and transmitting data feeds to and from the source of origin, enabling edge intelligence to adjust dataflow behavior with bi-directional communication, out to the last mile of digital signal.
minifi is designed to be a very small and lightweight footprint*, support central management of agents (versus nifi where each instance has built-in management capability), generate the same level of data provenance as nifi that is vital to edge analytics and ioat (internet of any thing) and integration with nifi for follow-on dataflow management and full chain of custody of information. (minifi is pronounced “minify”, [min-uh-fahy]) and the java version is supported as part of hdf 2.0.)
*minifi agent is <40 mb for the java agent version, < 10mb for c++ agent. for more information about minifi see the apache minifi project page . for a connected car example of minifi, see here.
those are the three things to know about hdf 2.0 that we will delve into further detail upon in upcoming blog posts. in the meantime, we would recommend the following for further reading about how hortonworks dataflow is used in real world environments.
Published at DZone with permission of Haimo Liu, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments