Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

A Streaming SQL Engine for Apache Kafka

DZone's Guide to

A Streaming SQL Engine for Apache Kafka

Open-source, distributed, scalable, and real-time, KSQL is the newest and easiest way to express continuous, interactive queries in Kafka.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

It was great speaking with Neha Narkhede, co-founder and CTO at Confluent, about the availability of KSQL, an open-source streaming SQL engine that enables continuous, interactive queries on Apache Kafka.

KSQL is a response to the business need to respond to continuous streams of data that are the source of truth for everything happening within the company. With KSQL, any developer that knows SQL can leverage real-time data. Unlike other stream processing engines that require complex infrastructure or mastery of various programming languages, KSQL gives users a familiar syntax in a solution that's easy to build and mange while benefiting from Kafka’s distributed, scalable, and reliable development and production history.

Common examples of stream processing might include comparing two or more streams of data to understand anomalies and respond to them in real time or transforming data as it's ingested to better suit downstream consumers. Stream processing can be used to identify fraud by financial services companies, monitor out-of-bounds system performance metrics, and much more. Other common uses include: 

  • Anomaly detection. Pattern recognition and anomaly detection are real-time and event-driven processes ideally suited to running against streams of data. This can be used horizontally across a number of industries and use cases. For example, financial institutions can alert a bank’s users of out-of-the-ordinary transactions while potentially detecting a security attack.
  • Monitoring. Business is able to move from batch systems to real-time notifications and catch issues in real time that can prevent entire system failures. With KSQL, teams can leverage SQL-like queries to solve problems through the use of a widely-known language while errors are happening, rather than waiting until the next day.
  • Streaming ETL. Companies often have numerous disparate applications with data that needs to be integrated, standardized, and enriched before it can be used in downstream applications or for analysis and reporting. The traditional solution to this batch-oriented process is called ETL (extract, transform, load). With the onset of large scale streams, this process is moving to a real-time world. However, those usually responsible for this function rarely live in a programming world and prefer working with SQL. KSQL enables this group to move forward from batch-processing ETL to doing it in real-time.
“Until now, stream processing has required complex infrastructure, sophisticated developers, and a serious investment. Our mission is to make stream processing easily accessible so anyone can derive insights from their streams of data,” says Neha. “With KSQL, stream processing on Apache Kafka is available through a familiar SQL-like interface, rather than only to developers who are familiar with Java or Python. It is the first completely interactive, distributed streaming SQL engine for Apache Kafka.”

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
big data ,ksql ,sql engine ,streaming data ,apache kafka ,stream processing ,queries

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}