Over a million developers have joined DZone.

TIBCO BusinessWorks and StreamBase for Big Data Integration and Streaming Analytics with Apache Hadoop and Impala

· Big Data Zone

Learn how you can maximize big data in the cloud with Apache Hadoop. Download this eBook now. Brought to you in partnership with Hortonworks.

Apache Hadoop is getting more and more relevant. Not just for Big Data processing (e.g. MapReduce), but also for Fast Data processing (e.g. Stream Processing). Recently, I published two blog posts on the TIBCO blog to show how you can leverage TIBCO BusinessWorks 6 and TIBCO StreamBase to realize Big Data and Fast Data Hadoop use cases.

TIBCO ActiveMatrix BusinessWorks 6 + Apache Hadoop = Big Data Integration

Apache Hadoop was built for processing complex computations on Big Data stores (that is, terabytes to petabytes) with a MapReduce distributed computation model that runs easily on cheap commodity hardware.

A Hadoop distribution from vendors such as Hortonworks, Cloudera or MapR packages different projects of the Hadoop ecosystem. This assures that all used versions work together smoothly. On top of the packaging, Hadoop vendors offer tooling for deployment, administration and monitoring of Hadoop clusters. Commercial support completes their offerings.

The key challenge is to integrate the input and results of Hadoop processing into the rest of the enterprise. Using just a Hadoop distribution requires a lot of complex coding for integration services.

Continue here for the full article: TIBCO ActiveMatrix BusinessWorks 6 + Apache Hadoop = Big Data Integration

TIBCO StreamBase + Hadoop + Impala = Fast Data Streaming Analytics

As of today, Hadoop is evolving quickly. It is not only used for batch processing anymore. YARN, Storm, Spark, and several other solutions introduce modern paradigms to Hadoop. However, some problems still remain with Hadoop:

  • No good, easy development tooling for the Hadoop ecosystem components such as Hive, Storm, Spark, etc.
  • Missing maturity (a lot of alpha/beta/0.x versions) especially in management and monitoring tools, as well as security, connectivity, and APIs
  • No “real time” (== seconds, milliseconds, microseconds), but “near real time” (still several seconds and more, much more when recovering from infrastructure faults)
  • No operational analytics (human monitoring and proactive actions)

So why not combine the great benefits of Hadoop with the Fast Data streaming analytics tool TIBCO StreamBase with its mature, mission-critical deployments in several different industries, great graphical tooling, and operational real-time analytics (via TIBCO Live Datamart on top of StreamBase)?

This post shows how to realize a Fast Data use case with TIBCO StreamBase and the Hadoop framework’s Impala analytical database quickly and easily.

Continue here for the full article: TIBCO StreamBase + Hadoop + Impala = Fast Data Streaming Analytics

For a general introduction to Stream Processing and Streaming Analytics, I recommend the InfoQ article: Real-Time Stream Processing as Game Changer in a Big Data World with Hadoop and Data Warehouse.

As always, I appreciate any feedback...

[Content from my blog: http://www.kai-waehner.de/blog/2015/04/14/tibco-businessworks-and-streambase-for-big-data-integration-and-streaming-analytics-with-apache-hadoop-and-impala/ ]

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

Topics:
mapreduce ,hadoop ,tibco ,cloudera ,big data ,hortonworks ,mapr ,stream processing ,businessworks ,integration

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}