DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Next-Gen Data Pipes With Spark, Kafka and k8s
  • Running Streaming ETL Pipelines with Apache Flink on Zeppelin Notebooks
  • Harnessing Real-Time Insights With Streaming SQL on Kafka
  • Real-Time Streaming Architectures: A Technical Deep Dive Into Kafka, Flink, and Pinot

Trending

  • Automating Data Pipelines: Generating PySpark and SQL Jobs With LLMs in Cloudera
  • Manual Sharding in PostgreSQL: A Step-by-Step Implementation Guide
  • Ethical AI in Agile
  • Agentic AI for Automated Application Security and Vulnerability Management
  1. DZone
  2. Data Engineering
  3. Databases
  4. Introducing Cloudera SQL Stream Builder (SSB)

Introducing Cloudera SQL Stream Builder (SSB)

SSB is an improved release of Eventador's SQL Stream Builder with integration into Cloudera Manager, Cloudera Flink, and other streaming tools.

By 
Tim Spann user avatar
Tim Spann
DZone Core CORE ·
Timothy Spann user avatar
Timothy Spann
·
Updated Jun. 06, 21 · Tutorial
Likes (5)
Comment
Save
Tweet
Share
14.4K Views

Join the DZone community and get the full member experience.

Join For Free

Cloudera SQL Stream Builder (SSB)

Cloudera SQL Stream Builder Screenshot

The initial release of Cloudera SQL Stream Builder as part of the CSA 1.3.0 release of Apache Flink and friends from Cloudera shows an integrated environment well integrated into Cloudera's Data Platform. SSB is an improved release of Eventador's SQL Stream Builder with integration into Cloudera Manager, Cloudera Flink, and other streaming tools.

In this initial release, the user is given a complete web UI to build, develop, test, and deploy enterprise Continuous SQL queries into YARN Apache Flink clusters. The initial source is any number of Kafka clusters and the first set of outputs are Kafka clusters and webhooks. You are also given the ability to build Materialized Views that act as constantly updated, but fast sources of data for REST clients to consume. This makes for a great interface to Kafka data from non-Kafka consumers.

All you need to do is type SQL against tables and the results can be in Materialized Views, Kafka topics, or webhooks. You can do rich SQL thanks to Apache Calcite including Joins, Order By, Aggregates, and more.

CSA 1.3.0 is now available with Apache Flink 1.12 and SQL Stream Builder! Check out this white paper for some details. You can get full details on the Stream Processing and Analytics available from Cloudera here.Stream Processing Diagram

Database Flow Diagram

This is an awesome way to query Kafka topics with continuous SQL that is deployed to scalable Flink nodes in YARN or K8. We can also easily define functions in JavaScript to enhance, enrich and augment our data streams. No Java to write; no heavy deploys or build scripts; we can build, test and deploy these advanced streaming applications all from your secure browser interface.

Example Queries:


SQL
 




xxxxxxxxxx
1


 
1
SELECT location, max(temp_f) as max_temp_f, avg(temp_f) as avg_temp_f,
2
                 min(temp_f) as min_temp_f
3
FROM weather2 
4
GROUP BY location



SQL
 




xxxxxxxxxx
1
13
9


 
1
SELECT HOP_END(eventTimestamp, INTERVAL '1' SECOND, INTERVAL '30' SECOND) as windowEnd,
2
       count(`close`) as closeCount,
3
       sum(cast(`close` as float)) as closeSum, avg(cast(`close` as float)) as closeAverage,
4
       min(`close`) as closeMin,
5
       max(`close`) as closeMax,
6
       sum(case when `close` > 14 then 1 else 0 end) as stockGreaterThan14 
7
FROM stocksraw
8
WHERE symbol = 'CLDR'
9
GROUP BY HOP(eventTimestamp, INTERVAL '1' SECOND, INTERVAL '30' SECOND)


                                                         

SQL
 




x


 
1
SELECT scada2.uuid, scada2.systemtime, scada2.temperaturef, scada2.pressure,     
2
       scada2.humidity, scada2.lux, scada2.proximity, 
3
       scada2.oxidising,scada2.reducing , scada2.nh3,    
4
       scada2.gasko,energy2.`current`,    
5
       energy2.voltage,energy2.`power`,energy2.`total`,energy2.fanstatus
6
FROM energy2 JOIN scada2 ON energy2.systemtime = scada2.systemtime


                                                 

SQL
 




xxxxxxxxxx
1
10


 
1
SELECT symbol, uuid, ts, dt, `open`, `close`, high, volume, `low`, `datetime`,          'new-high' message, 'nh' alertcode, 
2
       CAST(CURRENT_TIMESTAMP AS BIGINT) alerttime 
3
FROM stocksraw st 
4
WHERE symbol is not null 
5
AND symbol <> 'null' 
6
AND trim(symbol) <> '' 
7
AND CAST(close as DOUBLE) > 
8
(SELECT MAX(CAST(`close` as DOUBLE))
9
FROM stocksraw s 
10
WHERE s.symbol = st.symbol)



SQL
 




xxxxxxxxxx
1


 
1
SELECT  * 
2
FROM statusevents
3
WHERE lower(description) like '%fail%'



SQL
 




xxxxxxxxxx
1
26


 
1
SELECT  sensor_id as device_id,
2
  HOP_END(sensor_ts, INTERVAL '1' SECOND, INTERVAL '30' SECOND) as windowEnd,
3
  count(*) as sensorCount,
4
  sum(sensor_6) as sensorSum,
5
  avg(cast(sensor_6 as float)) as sensorAverage,
6
  min(sensor_6) as sensorMin,
7
  max(sensor_6) as sensorMax,
8
  sum(case when sensor_6 > 70 then 1 else 0 end) as sensorGreaterThan60
9
FROM iot_enriched_source
10
GROUP BY
11
  sensor_id,
12
  HOP(sensor_ts, INTERVAL '1' SECOND, INTERVAL '30' SECOND)



SQL
 




x


 
1
SELECT title, description, pubDate, `point`, `uuid`, `ts`, eventTimestamp
2
FROM transcomevents



Source Code:

  • https://github.com/tspannhw/CloudDemo2021
  • https://github.com/tspannhw/StreamingSQLDemos
  • https://github.com/tspannhw/SmartTransit

Example SQL Stream Builder Run

We log in, then build our Kafka data source(s), unless they were predefined.

Next, we build a few virtual table sources for Kafka topics we are going to read from. If they are JSON we can let SSB determine the schema for us. Or we can connect to the Cloudera Schema Registry for it to determine the schema for AVRO data.

We can then define virtual table syncs to Kafka or webhooks.

We then run a SQL query with some easy to determine parameters and if we like the results we can create a materialized view.
Console History

select from stocksraw

Console List

Selecting First Option

Console > Virtual Tables
Data Sources > Kafka Providers
Viewing Source Code
Success!
View Queries
API Keys > Materialized Views
Console > Compose
Running xenodocial_noyce
Console Logs

UUID Log List

weather2 Properties

Running wizardly_carson

Websink Sink Config

SQL Materialized View Results

Log Message

Console Logs ID 5206

Console SQL History

Kafka Sink List

Creating User Defined Function

Running hungry_archimedes

pedantic_wescolf Overview

Metrics

Compose SQL Materialized View

hungry_archimedes Details

Data Explorer

select from stocksraw Results

Compose New Jobs

Compose New Jobs 2

Console > Virtual Tables > Webhook Sinks

Metrics Settings

select eventTimestamp

SQL Stream Builder > Health Tests

Data Explorer List

Warning: Request template not enabled

hungry_archimedes Overview

Kafka Source: stocksraw

Streaming SQL Console Compose

Run SQL against unbounded streams

Results of Select

Console Virtual Tables List

function onCondition()

Weather Sink Webhook Sink

References:

  • https://docs.cloudera.com/csa/1.3.0/ssb-using-ssb/topics/csa-ssb-using-virtual-tables.html
  • https://docs.cloudera.com/csa/1.3.0/ssb-overview/topics/csa-ssb-intro.html
  • https://docs.cloudera.com/csa/1.3.0/ssb-overview/topics/csa-ssb-key-features.html
  • https://docs.cloudera.com/csa/1.3.0/ssb-overview/topics/csa-ssb-architecture.html
  • https://docs.cloudera.com/csa/1.3.0/ssb-quickstart/topics/csa-ssb-quickstart.html
  • https://docs.cloudera.com/csa/1.3.0/ssb-using-ssb/topics/csa-ssb-adding-kafka-data-source.html
  • https://docs.cloudera.com/csa/1.3.0/ssb-using-ssb/topics/csa-ssb-using-virtual-tables.html
  • https://docs.cloudera.com/csa/1.3.0/ssb-using-ssb/topics/csa-ssb-creating-virtual-kafka-source.html
  • https://docs.cloudera.com/csa/1.3.0/ssb-using-ssb/topics/csa-ssb-creating-virtual-kafka-sink.html
  • https://docs.cloudera.com/csa/1.3.0/ssb-using-ssb/topics/csa-ssb-creating-virtual-webhook-sink.html
  • https://docs.cloudera.com/csa/1.3.0/ssb-using-ssb/topics/csa-ssb-managing-time.html
  • https://docs.cloudera.com/csa/1.3.0/ssb-job-lifecycle/topics/csa-ssb-running-job-process.html
  • https://docs.cloudera.com/csa/1.3.0/ssb-job-lifecycle/topics/csa-ssb-job-management.html
  • https://docs.cloudera.com/csa/1.3.0/ssb-job-lifecycle/topics/csa-ssb-sampling-data.html
  • https://docs.cloudera.com/csa/1.3.0/ssb-job-lifecycle/topics/csa-ssb-advanced-job-management.html
  • https://docs.cloudera.com/csa/1.3.0/ssb-using-mv/topics/csa-ssb-using-mvs.html
  • https://www.cloudera.com/content/www/en-us/about/events/webinars/cloudera-sqlstream-builder.html
  • https://www.cloudera.com/about/events/webinars/demo-jam-live-expands-nifi-kafka-flink.html
  • https://www.cloudera.com/about/events/virtual-events/cloudera-emerging-technology-day.html
sql Stream processing Database kafka Apache Flink

Published at DZone with permission of Tim Spann, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Next-Gen Data Pipes With Spark, Kafka and k8s
  • Running Streaming ETL Pipelines with Apache Flink on Zeppelin Notebooks
  • Harnessing Real-Time Insights With Streaming SQL on Kafka
  • Real-Time Streaming Architectures: A Technical Deep Dive Into Kafka, Flink, and Pinot

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!