Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

How to Schedule Pipeline Execution Using the SDC REST APIs

DZone's Guide to

How to Schedule Pipeline Execution Using the SDC REST APIs

This tutorial seeks to clarify how to schedule Streamsets Data Collector pipeline execution with the SDC REST API method.

· Integration Zone ·
Free Resource

Learn more about how to Prevent Slow or Broken APIs From Affecting Your Bottom Line.

A hot topic in the sdc-user group during the past weeks has been about how to schedule the start and stop of SDC pipelines. Usage of the SDC REST APIs has been suggested in some threads, but because the general impression I have is that the audience doesn't have a clear idea about them, I decided to write this post to help and clarify once and for all how to do it.

Streamsets Data Collector REST APIs

SDC provides REST APIs, which allow you to do a lot of things. The full list of APIs and detailed info on how to invoke them can be accessed from the SDC dashboard, by clicking first on the help icon and then on the RESTful API link.

Image title

They are grouped in six different categories:

  •  acl

  •  definitions

  •  manager

  •  preview

  •  store

  •  system

The manager section contains the API to start and stop a pipeline:

Image title

Image title

In order to start a pipeline using the specific REST API, you need to make a POST request the following way (in this case we are using cUrl for this purpose):

curl -u <username>:<password> -X POST https://<sdc_host>:<port:>/rest/v1/pipeline/<pipeline_id>/start -H "X-Requested-By:sdc"

Same way to stop a pipeline:

curl -u <username>:<password> -X POST https://<sdc_host>:<port:>/rest/v1/pipeline/<pipeline_id>/stop -H "X-Requested-By:sdc"

You have to specify the credentials of a user which has permission to start or stop a pipeline. The X-Requested-By header attribute is mandatory for POST requests to these REST APIs.

Scheduling

Now that we know what are the commands to trigger a pipeline start or stop it couldn't be hard to set a schedule for them. Save the commands above to scripts and then prepare a Cron expression for them to be used with crontab. This is the general syntax for a Cron expression:

* * * * * command or script to be executed

- - - - -

| | | | |

| | | | ----- Day of week (0 - 7) (Sunday=0 or 7)

| | | ------- Month (1 - 12)

| | --------- Day of month (1 - 31)

| ----------- Hour (0 - 23)

------------- Minute (0 - 59)

Examples (to start a pipeline every day at 00:15 and stop it at 00:30):

15 0 * * * /home/guglielmo/scripts/start_pipeline.sh

30 0 * * * /home/guglielmo/scripts/stop_pipeline.sh

Of course, you could reuse the same expressions with any other scheduler (like Rundeck) you should have in-house.

Learn about the Five Steps to API Monitoring Success with Runscope

Topics:
rest api ,cron ,curl ,integration ,pipelines

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}