Over a million developers have joined DZone.

BigData Workflows Made Easy -> Glue

DZone's Guide to

BigData Workflows Made Easy -> Glue

Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

Glue is a job execution engine, written in Java and Groovy. 

Workflows are written in Groovy DSL / Jython / Clojure / JRuby and use pre-developed modules to interact with external resources e.g. DBs, Hadoop, Netezza, FTP etc.

Glue is not XML, and is not a BI tool, but rather a tool that allows programmers to write workflows in a production environment using any of the supported languages. 

The nicest thing about glue is its modules that allows you to interact with DBs Hadoop Clusters etc using tested methods and which can be setup once and re-used in each workflow, this abstracts the configuration away from the workflows and saves tons of time spent debugging.

Another cool feature is the ability to run data-driven workflows from hadoop, i.e. you can register N workflows to a HDFS directory path and have those workflows run automatically as data arrives in that directory.



Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.


Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}