Over a million developers have joined DZone.

Glue and Big Data: Getting Started, Part 1

Where to start?

Glue is split into three parts:
  • glue-rest - this is the workflow engine that will execute your jobs
  • gluecron - this is the cron/datadriven deamon that launch workflows based on cron or data in hdfs
  • glue-ui - a simple ui that gives you insight into the workflows running and their output


Initial Requirements

As with all things hadoop related it's best to use linux. Technically Glue does not require a linux machine because it runs on the JVM, but even for trying out the examples its best to create a linux VM (ubuntu or centos) using VirtualBox or another VM app.

Java 6+

Nothing more is required, Glue is packed with its own libraries.

Install Glue Rest

Ok, I could have chosen a better name, but the naming sort of stuck ever since Glue was created.   This is a simple step:Download the rpm from https://sourceforge.net/projects/glueworkflows/files
If your using ubuntu use: sudo alien 'rpm' to convert to a deb.

To install type:

sudo rpm -i 'rpm'
sudo dpkg -i 'deb'

The package installs to /opt/glue and you can run it using 

service glue-server start
/etc/init.d/glue-server start

Install Glue Cron

Download the gluecron rpm from https://sourceforge.net/projects/glueworkflows/file
Again for a deb use sudo alien 'rpm'

To install type:

sudo rpm -i 'rpm'
sudo dpkg -i 'deb'

The package installs to /opt/gluecron and you can run it using:

service gluecron start
/etc/init.d/gluecron start

Don't worry if gluecron or glue gives you errors on startup at the moment.
We'll need to configure them first.
That is the aim of part 2 (coming soon).

To explore more please go to: http://gerritjvv.github.io/glue/documentation.html


{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}