Over a million developers have joined DZone.

Running Concurrent Workflows in Ruby

Up until recently, I had to run literally hundreds of jobs one by one to fetch, normalize, and enhance content and refresh it on the page. Then I discovered Gush and life became much easier...

· Web Dev Zone

Start coding today to experience the powerful engine that drives data application’s development, brought to you in partnership with Qlik.

Have you ever struggled with a chain of rake tasks that needed to be run periodically and their runtime was giving you a headache? Did you ask yourself how to save time by boosting execution along with readability for huge chunks of jobs? These questions are very common when it comes to applications with a hefty amount of background processing jobs.

It is my own experience that inspired me to write this blog post. I had to run literally hundreds of jobs one by one to fetch, normalize, and enhance content and refresh it on the page. This normally lasted around 1-2 weeks and every time cost me the death of a few brain cells.

In addition, the entire process wasn’t properly documented and systematized so, in fact, sometimes I needed to run the same batch of jobs two or more times.

The solution was closer than I expected and was created by my fellow Chaps developers months ago. Namely, why not group tasks into logical structures and run them concurrently? If they don’t depend on each other, I can’t really see any obstacles to parallel execution.

And, here arrived Gush. Gush is graph based and gem-devised to facilitate organization of jobs into self-describing workflows. It uses Sidekiq API to store jobs and dependecies among them.


To install Gush simply add it to Gemfile:

gem 'gush'

and bundle bundle install

From now on, your classes will be able to inherit from two fundamental classes in Gush: Gush::Job and Gush::Workflow. The former defines a single job (which can be compared to a rake task in Rails) and the latter groups all jobs in a cohesive workflow.


Firstly, let’s take a closer look at Gush::Job and how to create a simple job. It cannot get any easier than this:

class FetchPosts < Gush::Job
  def work
    Fetcher.fetch :posts

It does remind you of something, doesn’t it? Yes, we are defining the #work method that is the exact counterpart of the #perform method in Sidekiq. When a job is ready to go, #work method is simply triggered. No rocket science.


Once jobs are defined we have to build a workflow that wilould gather all jobs in one logical set. Every workflow must inherit fromGush::Workflow and implement configure method.

class PostsAndTopicsProcess < Gush::Workflow
  def configure
    run FetchPosts
    run FetchTopics
    run PersistPosts
    run PersistTopics
    run NormalizeEntities
    run StoreEntities

Gush graph

Very basic stuff, but there is one obvious problem with this workflow. Since we haven’t declared any dependencies among jobs, all of them will be triggered at the same time. To set execution order for jobs we can use either before or after param.

class PostsAndTopicsProcess < Gush::Workflow
  def configure
    run FetchPosts
    run FetchTopics
    run PersistPosts, after: FetchPosts
    run PersistTopics, after: FetchTopics
    run NormalizeEntities, after: [PersistPosts, PersistTopics]
    run StoreEntities, after: NormalizeEntities

Gush graph

Now it looks much better. We set dependencies on jobs so Gush knows how to trigger consecutive tasks. We are able to define either before or after hook. Names are self-explanatory. I decided to stick to after and not to mix it with before because the workflow would become nasty.

What if our jobs or workflow needed additional arguments? Nothing simpler than passing them like this:

class PostsAndTopicsProcess < Gush::Workflow
  def configure(url)
    run FetchPosts, params: { url: url, in_batches: 200 }
    run FetchTopics, params: { url: url, in_batches: 20 }
    run PersistPosts, after: FetchPosts
    run PersistTopics, after: FetchTopics
    run NormalizeEntities, after: [PersistPosts, PersistTopics]
    run StoreEntities, after: NormalizeEntities

This is very handy when our workflow has to run the same job two or more times with different params.

Summing Up

The first two jobs are independent so they will be started at the same time. Their outcome is required to run PersistPosts and PersistTopics, so they must wait as they depend on the said outcome. Once posts and topics are persisted they can finally be normalized and stored. Params passed to a job can be easily accessed as is in #work method body.

Gush uses Redis as a storage so each one of jobs has its unique id and array of job ids on which they depend. When the workflow is initialized, all of these ids are persisted. All are wrapped in sidekiq. Cool, isn’t it? No need to explain in detail how Sidekiq handles entire tasks’ logic but over the past years it proved to be a trustworthy partner in our applications, so I reckon we can trust it again.

Running Workflows

To build and run a new workflow we have to fire up sidekiq workers:

bundle exec gush workers

And then, create a new object of our workflow, save it, and start!

workflow = PostsAndTopicsProcess.new

Gush workers opened in a separate window should automatically inform us about newly started jobs:

2016-01-14T11:11:31.462Z 22981 TID-10vdc8 Gush::Worker JID-625e3ee8ed9a279d88efad3e INFO: start
2016-01-14T11:11:32.812Z 22981 TID-10v8sw Gush::Worker JID-c9fb521bbb857afa104fad28 INFO: start

It's worth checking Gush workers from time to time just to ensure whether the entire process goes smoothly and didn’t fail in the middle of the run.

Job States and Exception Handling

Like in all applications, some of our jobs might fail. This is a natural part of any application.

How does Gush handle failures? It's pretty straightforward. When an exception is raised in one of the jobs, it doesn’t mean the entire workflow stops. Only the jobs depending on the failed ones halt and wait while the rest of the graph keeps going.

So, if some jobs have failed, we can deploy fixes and resume the workflow. Gush will retry the failed jobs and proceed once they are successfully completed. To do so we need to find our workflow and use continue method.

workflow = Gush::Workflow.find(<workflow_id>)


If you don’t like using the API directly, there’s a gush-control gem that adds a web GUI you can run on top of Gush.

Bash Commands

To simplify workflows and jobs management, a handful of bash commands are provided out of the box. The only requirement to run them is to create Gushfile.rb file that loads application files. Gushfile obviously should require all your jobs and workflows. For example:

require_relative './lib/your_project'

Dir[Rails.root.join("app/workflows/**/*.rb")].each do |file|
  require file

Great. Having Gushfile.rb required we can use a couple of very handy commands:

bundle exec gush list
|                  id                  |     name      |      status       |
| 62b2a811-dff6-47ce-a183-db596475c431 | PostsProcess  |       done        |
| 73ea0e32-6350-4d68-814b-fb70e6f5405e | TopicsProcess |      failed       |
bundle exec gush show 73ea0e32-6350-4d68-814b-fb70e6f5405e
|       ID       | 73ea0e32-6350-4d68-814b-fb70e6f5405e |
|      Name      | TopicsProcess                        |
|      Jobs      | 4                                    |
|  Failed jobs   | 1                                    |
| Succeeded jobs | 2                                    |
| Enqueued jobs  | 1                                    |
|  Running jobs  | 0                                    |
| Remaining jobs | 2                                    |
|   Started at   | 1452580414                           |
|     Status     | failed                               |
|                | TopicsNormalize failed               |

Jobs list:
[X] TopicsNormalize
[✓] TopicsFetch
[✓] TopicsPersist
[ ] TopicsStore

We might even visualize our workflow by running: bash bundle exec gush viz TopicsProcess

This basically builds graphical version of workflow showed in previous chapter.

Final Comments

And that’s it. Gush does not come as a magical cure-all to solve problems with job performance itself but with relatively small overhead you may significantly upscale your background processing by parallelizing it. If you find this library interesting and useful in your environment but some features are not implemented yet, feel free to drop us an issue or pull request: https://github.com/chaps-io/gush. We are open to improvement.

Create data driven applications in Qlik’s free and easy to use coding environment, brought to you in partnership with Qlik.


Published at DZone with permission of Maciej Nowak. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}