How to Schedule Pipeline Execution Using the SDC REST APIs
This tutorial seeks to clarify how to schedule Streamsets Data Collector pipeline execution with the SDC REST API method.
Join the DZone community and get the full member experience.
Join For FreeA hot topic in the sdc-user group during the past weeks has been about how to schedule the start and stop of SDC pipelines. Usage of the SDC REST APIs has been suggested in some threads, but because the general impression I have is that the audience doesn't have a clear idea about them, I decided to write this post to help and clarify once and for all how to do it.
Streamsets Data Collector REST APIs
SDC provides REST APIs, which allow you to do a lot of things. The full list of APIs and detailed info on how to invoke them can be accessed from the SDC dashboard, by clicking first on the help icon and then on the RESTful API link.
They are grouped in six different categories:
acl
definitions
manager
preview
store
system
The manager section contains the API to start and stop a pipeline:
In order to start a pipeline using the specific REST API, you need to make a POST request the following way (in this case we are using cUrl for this purpose):
curl -u <username>:<password> -X POST https://<sdc_host>:<port:>/rest/v1/pipeline/<pipeline_id>/start -H "X-Requested-By:sdc"
Same way to stop a pipeline:
curl -u <username>:<password> -X POST https://<sdc_host>:<port:>/rest/v1/pipeline/<pipeline_id>/stop -H "X-Requested-By:sdc"
You have to specify the credentials of a user which has permission to start or stop a pipeline. The X-Requested-By header attribute is mandatory for POST requests to these REST APIs.
Scheduling
Now that we know what are the commands to trigger a pipeline start or stop it couldn't be hard to set a schedule for them. Save the commands above to scripts and then prepare a Cron expression for them to be used with crontab. This is the general syntax for a Cron expression:
* * * * * command or script to be executed
- - - - -
| | | | |
| | | | ----- Day of week (0 - 7) (Sunday=0 or 7)
| | | ------- Month (1 - 12)
| | --------- Day of month (1 - 31)
| ----------- Hour (0 - 23)
------------- Minute (0 - 59)
Examples (to start a pipeline every day at 00:15 and stop it at 00:30):
15 0 * * * /home/guglielmo/scripts/start_pipeline.sh
30 0 * * * /home/guglielmo/scripts/stop_pipeline.sh
Of course, you could reuse the same expressions with any other scheduler (like Rundeck) you should have in-house.
Opinions expressed by DZone contributors are their own.
Comments