DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
View Events Video Library
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Integrating PostgreSQL Databases with ANF: Join this workshop to learn how to create a PostgreSQL server using Instaclustr’s managed service

Mobile Database Essentials: Assess data needs, storage requirements, and more when leveraging databases for cloud and edge applications.

Monitoring and Observability for LLMs: Datadog and Google Cloud discuss how to achieve optimal AI model performance.

Automated Testing: The latest on architecture, TDD, and the benefits of AI and low-code tools.

Related

  • Enterprise and IIoT Edge Processing With Apache NiFi, MiNiFi, and Deep Learning
  • Apache Kafka Is the New Black at the Edge in IoT Projects
  • Hadoop Ecosystem: Hadoop Tools for Crunching Big Data
  • Migrating Apache Flume Flows to Apache NiFi: Kafka Source to Multiple Sinks

Trending

  • Exploring Sorting Algorithms: A Comprehensive Guide
  • Writing Reusable SQL Queries for Your Application With DbVisualizer Scripts
  • Navigating the Skies
  • LLMs for Bad Content Detection: Pros and Cons
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Three Things To Know About HDF 2.0

Three Things To Know About HDF 2.0

This article covers the major release from Hortonworks including a new version of NiFi, Kafka, and Storm.

Haimo Liu user avatar by
Haimo Liu
·
Kanishk Mahajan user avatar by
Kanishk Mahajan
·
Aug. 12, 16 · Opinion
Like (4)
Save
Tweet
Share
4.88K Views

Join the DZone community and get the full member experience.

Join For Free

image title

hortonworks dataflow (hdf) 2.0, offers a combination apache nifi 1.0, kafka 0.10 and storm 1.0. hdf 2.0 has significant architecture and enterprise productivity features to make it faster and easier to deploy, manage and analyze streaming data. in the next few weeks, we will go into more details, but for now, here are the three highlights to take note of.

1. integrated, enterprise ready ecosystem of apache nifi, kafka, storm with ambari, ranger, and zookeeper

hdf 2.0 offers an enterprise-ready,  integrated deployment and management option for streaming analytics, from the edge into the core with:

  • apache nifi for dynamic, configurable data pipelines, through which all sources, systems, and destinations communicate.
  • apache kafka 0.10 for high throughput distributed messaging with pub sub semantics to operate at speed on big data volumes that adapt to differing rates of data creation and delivery
  • apache storm 1.0 for real-time streaming analytics to create immediate insights at  massive scale, with performance that is 6-10x faster than any previous storm release .

with the new enterprise readiness features of hdf 2.0, businesses can accelerate business value from data in motion through improved developer productivity, operational and architectural improvements

developer productivity improvements of hdf 2.0

  • storm windowing and state management
  • improved storm topology debugging including dynamic worker profiling, topology event inspector, dynamic log levels and distributed log search
  • improved kafka sasl and kafka automated replica leader election
  • improved storm scalability with pacemaker daemon, resource aware scheduling and improved nimbus ha

operational visibility improvements of hdf 2.0

  • integrated and comprehensive platform level monitoring, management, governance and security for apache storm 1.0 and kafka 0.10
  • integrated ambari views for storm for management and monitoring
  • integrated ambari metrics server and grafana integration for both storm and kafka that provides improved metrics collection and sampling to get more accurate and granular metrics performance, as well as time series metrics visualization and configurable metrics dashboards

architectural improvements of hdf 2.0

  • zero-master clustering paradigm – where each node in a nifi cluster performs the same tasks on the data, but each operates on a different set of data. as a result, the dataflow manager can interact with the nifi cluster through the user interface (ui) of any node and any change is replicated to all nodes in the cluster, allowing for multiple entry points.  this results in a different deployment architecture than previous hdf releases and eliminates any possibility of not being able to access the management interface of any particular nifi instance.
  • many enterprises today deploy a combination of individual products for data movement, data collection, messaging bus and real-time streaming analytics to create an integrated in-house solution. hdf accelerates the on-ramp to streaming analytics with an integrated enterprise ready solution.

    hdf 20 hortonworks ambari apache nifi kafka storm

    hdf 20 comprehensive ambari views

    2. productivity gains with new visual user interface and multi-tenant authorization

    apache nifi is a fairly mature project in the sense that it started almost exactly 10 years ago with roots in the nsa (happy 10th birthday apache nifi!)  and noted by a tweet in june 2016 from domink benz “ nifi is an project in the hadoop space with a nice gui. and documentation.” *everybody laughs*

    but now, to match a modern ui aesthetic and meet new enterprise productivity demands, the apache nifi visual user interface has been given both a facelift as well with new ui options to make is easier and faster for dataflow creation, management, tuning and control of real-time data. it also has new ui features to make it easier for deployment and operational scenarios, including the needs for multi-tenant authorization – the ability for multiple entities within an enterprise to securely manage different portions of the same dataflow.

    this allows enterprise productivity gains unparalleled by any existing design and deploy options for data movement. each entity has fine-grained component level permission control in order to manage access, usage and modification of their dataflows, and yet, each can still view each other’s dataflows for full context and understanding of the data being transmitted and received. the equivalent of having multiple collaborators work on a single shared google doc, multi-tenancy in apache nifi gives enterprises a common infrastructure connecting disparate teams and data sets in real-time and provides secure transparency between one another’s projects.

    hdf 2.0 mew gui nifi-component-samples

    3) support for apache minifi

    minifi-logo

    hdf 2.0 supports apache minif i, a subproject of apache nifi, designed to solve the difficulties of managing and transmitting data feeds to and from the source of origin, enabling edge intelligence to adjust dataflow behavior with bi-directional communication, out to the last mile of digital signal.

    minifi is designed to be a very small and lightweight footprint*, support central management of agents (versus nifi where each instance has built-in management capability), generate the same level of data provenance as nifi that is vital to edge analytics and ioat (internet of any thing) and integration with nifi for follow-on dataflow management and full chain of custody of information. (minifi is pronounced “minify”, [min-uh-fahy]) and the java version is supported as part of hdf 2.0.)

    *minifi agent is <40 mb for the java agent version, < 10mb for c++ agent. for more information about minifi see the apache minifi project page . for a connected car example of minifi, see here.

    those are the three things to know about hdf 2.0 that we will delve into further detail upon in upcoming blog posts. in the meantime, we would recommend the following for further reading about how hortonworks dataflow is used in real world environments.

    hadoop Big data kafka Apache NiFi Integration clustering

    Published at DZone with permission of Haimo Liu, DZone MVB. See the original article here.

    Opinions expressed by DZone contributors are their own.

    Related

    • Enterprise and IIoT Edge Processing With Apache NiFi, MiNiFi, and Deep Learning
    • Apache Kafka Is the New Black at the Edge in IoT Projects
    • Hadoop Ecosystem: Hadoop Tools for Crunching Big Data
    • Migrating Apache Flume Flows to Apache NiFi: Kafka Source to Multiple Sinks

    Comments

    Partner Resources

    X

    ABOUT US

    • About DZone
    • Send feedback
    • Careers
    • Sitemap

    ADVERTISE

    • Advertise with DZone

    CONTRIBUTE ON DZONE

    • Article Submission Guidelines
    • Become a Contributor
    • Visit the Writers' Zone

    LEGAL

    • Terms of Service
    • Privacy Policy

    CONTACT US

    • 3343 Perimeter Hill Drive
    • Suite 100
    • Nashville, TN 37211
    • support@dzone.com

    Let's be friends: