DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
What's in store for DevOps in 2023? Find out today at 11 am ET in our "DZone 2023 Preview: DevOps Edition!"
Last chance to join
  1. DZone
  2. Data Engineering
  3. Data
  4. Democratizing Analytics Within Kafka With 3 New Access Patterns

Democratizing Analytics Within Kafka With 3 New Access Patterns

The upcoming Hortonworks Data Platform (HDP) 3.1 and Hortonworks DataFlow (HDF) 3.3 releases plan to introduce three new Kafka analytics access patterns.

George Vetticaden user avatar by
George Vetticaden
·
Dec. 14, 18 · Analysis
Like (4)
Save
Tweet
Share
9.81K Views

Join the DZone community and get the full member experience.

Join For Free

With the release of Hortonworks Streams Messaging Manager (SMM) this year, we have focused on helping DevOps and Platform teams cure their Kafka Blindness. The Hortonworks product and engineering teams continue to invest in building capabilities in SMM with new upcoming features like alerting and topic lifecycle management.

In addition to the SMM investments, the team has also been focusing on the needs of application and BI developer personas to help them be more successful implementing different use cases where Kafka is key component of the application architecture.

To this end, the product and engineering teams have been conducting interviews with application architects and developers within our largest enterprise Kafka customers. From these discussions, several trends and requirements are clearly emerging:

  • Trend #1: Kafka is becoming the de facto streaming event hub in the enterprise.
  • Trend #2: Customers are starting to use Kafka for longer term storage. E.g.: Retention periods for Kafka topics are getting longer. Kafka is being used more and more as a streaming event storage substrate.
  • Key Requirement: Application and BI/SQL developers need different Kafka analytic tools/access patterns based on different use cases / requirements. The current tooling is limited.

3 New Kafka Analytics Access Patterns Introduced for Application and BI Developers

To address these trends/requirements, the upcoming Hortonworks Data Platform (HDP) 3.1 and Hortonworks DataFlow (HDF) 3.3 releases plan to introduce 3 new powerful Kafka analytics access patterns for application and BI developers.

A summary of these three new access patterns:

  • Stream Processing: Kafka Streams Support - With existing support for Spark Streaming, SAM/Storm, Kafka Streams addition provides developers with more options for their stream processing and microservice needs.
  • SQL Analytics: New Hive Kafka Storage Handler - View Kafka topics as tables and execute SQL via Hive with full SQL Support for joins, windowing, aggregations, etc.
  • OLAP Analytics: New Druid Kafka Indexing Service - View Kafka topics as cubes and perform OLAP style analytics on streaming events in Kafka using Druid.

Application Developer Persona: Secure and Governed Microservices with HDP/HDF Kafka Streams

HDP/HDF supports two stream processing engines: Spark Structured Streaming and Streaming Analytics Manager (SAM) with Storm. Based on the application non-functional requirements, our customers have the choice to pick right stream processing engine for their needs. Some of the key stream processing requirements we heard from our customers were the following:

  • The choice to select the right stream processing engine for their set of requirements is important. Key non-functional requirements that drive the choice of engine include batching vs. event-at-time processing, ease of use, exactly-once processing, handling late arriving data, state management support, scalability/performance, maturity, and so on.
  • The current two choices have limited capabilities when building streaming microservice apps.
  • All stream processing engines should use a centralized set of platform services providing security (authentication/authorization), audit, governance, schema management and monitoring capabilities.

To address these requirements, support for Kafka Streams is being added in the upcoming HDP 3.1 and HDF 3.3 releases with full integration with security, governance, audit and schema management platform services.

Kafka Streams integrated with Schema Registry, Atlas, Ranger and Stream Messaging Manager (SMM) now provides customers a comprehensive platform to build microservice apps addressing complex security, governance, audit and monitoring requirements.

BI Persona: Real SQL on Real Time Stream

The stream processing engines discussed above provide a programmatic stream processing access pattern to Kafka. Application developers love this access pattern but when you talk to BI developers, their analytics requirements are quite different which are focused on use cases around ad hoc analytics, data exploration and trend discovery. BI persona requirements for Kafka include:

  • Treat Kafka topics/streams as tables
  • Support for ANSI SQL
  • Support complex joins (different join keys, multi-way join, join predicate to non-table keys, non-equi joins, multiple joins in the same query)
  • UDF support for extensibility
  • JDBC/ODBC support
  • Creating views for column masking
  • Rich ACL support including column level security

To address these requirements, the upcoming HDP 3.1 release will add a new Hive Storage Handler for Kafka which allows users to view Kafka topics as Hive tables. This new feature allows BI developers to take full advantage of Hive analytical operations/capabilities including complex joins, aggregations, UDFs, pushdown predicate filtering, windowing, etc.

In addition, the new Hive Kafka storage handler is fully integrated with Ranger which provides powerful capabilities such as column level security. This is an exciting new feature as column level security on streaming events has been one of the most asked for features in Kafka.

Kafka + Druid + Hive = Powerful New Access Pattern for Streaming Data in Kafka

The new Hive SQL access to Kafka will allow BI developers to address whole set of new use cases for Kafka around data exploration, trend discovery and ad-hoc analytics. In addition to these use cases, customers also have requirements for high performance OLAP style analytics on streaming data in Kafka. Users want to use SQL and interactive dashboards to do rollups and aggregations on streaming data in Kafka.

To address these requirements, we will be adding a powerful new Druid Kafka index service managed by Hive which will be available in the upcoming HDP 3.1 release.

As the diagram above illustrates, a Kafka topic can be viewed as an OLAP cube. Apache Druid (incubating) is a high performance analytics data store for event-driven data. Druid combines ideas from OLAP/time series databases, and search systems to create a unified system for operational analytics. The new integration provides a new Druid Kafka indexing service that indexes the stream data in a Kafka topic into a Druid cube. The indexing service can be managed by Hive as an external table providing SQL interface to the Druid cube backed by Kafka topic.

What's Next?

This blog was intended to give you a sneak peek at the three new powerful Kafka analytics access patterns that will soon be available in HDP 3.1 and HDF 3.3. This will be the first installment in this Kafka Analytics blog series. Subsequent blogs in this series will walk through each of these access patterns in more detail. Stay tuned!

kafka Analytics Stream processing Database sql Requirement application dev Data (computing) microservice

Published at DZone with permission of George Vetticaden, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • The Quest for REST
  • Pros and Cons of Using Styled Components in React
  • Efficiently Computing Permissions at Scale: Our Engineering Approach
  • Educating the Next Generation of Cloud Engineers With Google Cloud

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: