Building Serverless Data Extraction API Using Kumologica
In this article, see how to build a simple API that demonstrates the extraction of data from an external API and prepares them for processing.
Join the DZone community and get the full member experience.Join For Free
Data extraction is a common task in IT world for retrieving data from a data source and preparing it for further processing or storage. In this article, I thought of building a simple API that would demonstrate the extraction of data from an external API and preparing them for processing. This use case can help data science engineers in building similar data extraction capability with very minimal effort.
We will use COVID19 API which is free API that would provide the current stats on COVID in each country. In order for processing the data returned by COVID19 API it will be converted to CSV format with only the extracted set of fields. This API flow will be developed using Kumologica.
The API flow will accept the country name as the parameter. Using the country name the flow will invoke COVID19 API to get the current stats for that particular country. The response JSON data from COVID19 API will be filtered to specific data structure which will then be converted to CSV format. The CSV content will be placed on to Amazon S3 bucket for further processing.
- Kumologica designer installed in your machine. https://kumologica.com/download.html
- Create an AWS S3 bucket with the name — covidinfostore.
The diagram below shows the different systems that our flow will be responsible to orchestrate. Given that most of our dependencies are in AWS, we are going to target AWS Lambda as our deployment target to run our flow.
- Open Kumologica Designer, click the Home button and choose Create New Kumologica Project.
- Enter name (for example DataExtractionFlow), select directory for project and switch Source into From Existing Flow …
- Copy and Paste the following flow
- Press Create Button.
You should be seeing flow as given below on the designer canvas.
Understanding the flow
- Get /covid/stats/:country is the EventListener node is configured to have the EventSource as Amazon API gateway. The node will have the following configuration.
2. SetCountry is the set-property node to set country as a variable.
3. SetUrl is the set-property node to set url as a variable. Select JSONata in drop down.
4. InvokeCOVID19API is the HTTP request node with the following configuration.
5. ExtractValues is the Datamapper node with the following JSONata expression to extract the values.
6. CSV is the CSV node to transform the JSON object to CSV format.
7. S3 is the S3 bucket node to put the csv object to covidinfostore bucket.
- Select CLOUD tab on the right panel of Kumologica designer, select your AWS Profile.
- Go to “Trigger” section under cloud tab and select the Amazon API Gateway trigger.
3. Press Deploy button.
- Invoke the following endpoint using any REST client of your choice.
You should see the csv file with the name of the country as the file name in the S3 bucket.
This article has shown how easy to develop a serverless data extraction API using Kumologica Designer.
Kumologica is totally free to download and use. If you give it a try, we would love to hear your feedback.
Opinions expressed by DZone contributors are their own.