Serverless Data Processing Using Azure Tools
Serverless Data Processing Using Azure Tools
In this blog, we will see it in action using an example. See how to combine real-time data ingestion component with a Serverless processing layer.
Join the DZone community and get the full member experience.Join For Free
One of the previous blogs covered some of the concepts behind how Azure Event Hubs supports multiple protocols for data exchange. In this blog, we will see it in action using an example. With the help of a sample app, you will see how to combine real-time data ingestion component with a Serverless processing layer.
The sample application has the following components:
- Azure Event Hubs with a Kafka endpoint
- A producer application which pushes data to Event Hub topic
- A Serverless app built with Azure Functions which consumes from Event Hubs, enriches it and finally stores it in Azure Cosmos DB
To follow along and deploy this solution to Azure, you are going to need a Microsoft Azure account. You can grab one for free if you don't have it already!
Let's go through the individual components of the applications
As always, the code is available on GitHub.
This is pretty straightforward - it is a Go app which uses the Sarama Kafka client to send (simulated)
"orders" to Azure Event Hubs (Kafka topic). It is available in the form of a Docker image for ease of use (details in next section)
Here is the relevant code snippet:
A lot of the details have been omitted (from the above snippet) - you can grok through the full code here. To summarize, an
Order is created, converted (marshaled) into JSON (
bytes) and sent to Event Hubs Kafka endpoint.
Serverless part is a Java Azure Function. It leverages the following capabilities:
The Trigger allows the Azure Functions logic to get invoked whenever an
order event is sent to Azure Event Hubs. The Output Binding takes care of all the heavy lifting such as establishing database connection, scaling, concurrency, etc. and all that's left for us to build is the business logic, which in this case has been kept pretty simple - on receiving the
order data from Azure Event Hubs, the function enriches it with additional info (customer and product name in this case), and persists it in an Azure Cosmos DB container.
You can check the
OrderProcessor code on Github, but here is the gist:
storeOrders method is annotated with
@FunctionName and it receives data from Event Hubs in the form of an
OrderEvent object. Thanks to the
@EventHubTrigger annotation, the platform that takes care of converting the Event Hub payload to a Java
POJO (of the type
OrderEvent) and routing it correctly. The
connection = "EventHubConnectionString" part specifies that the Event Hubs connection string is available in the function configuration/settings named
@CosmosDBOutput annotation is used to persist data in Azure Cosmos DB. It contains the Cosmos DB database and container name, along with the connection string which will be picked up from the
CosmosDBConnectionString configuration parameter in the function. The POJO (
Order in this case) is persisted to Cosmos DB with a single
setValue method call on the
OutputBinding object - the platform makes it really easy, but there is a lot going on behind the scenes!
Let's switch gears and learn how to deploy the solution to Azure
- Ideally, all the components (Event Hubs, Cosmos DB, Storage, and Azure Function) should be the same region
- It is recommended to create a new resource group to group these services so that it is easy to locate and delete them easily
- Microsoft Azure account (as mentioned in the beginning)
- Create a Kafka enabled Event Hubs namespace
- Create Azure Cosmos DB components: account, database and container (please make sure that the name of the Cosmos DB database is
AppStoreand the container is named
orderssince this is what the Azure Functions logic uses)
- Create an Azure Storage account - this will be used by Azure Functions
This example makes use of the Azure Functions Maven plugin for deployment. First, update the
pom.xml to add the required configuration.
<appSettings> section and replace values for
Use the Azure CLI to easily fetch the required details
AzureWebJobsStorage: Get the Azure Storage connection string
EventHubConnectionString: Get Event Hubs connection string
CosmosDBConnectionString: Get Cosmos DB connection string
configuration section, update the following:
resourceGroup: the resource group to which you want to deploy the function to
region: Azure region to which you want to deploy the function to (get the list of locations)
To deploy, you need two commands:
mvn clean package- prepare the deployment artifact
mvn azure-functions:deploy- deploy to Azure
You can confirm using Azure CLI
az functionapp list --query "[?name=='orders-processor']"or the portal
Set environment variables:
Run the Docker image
docker run -e EVENTHUBS_BROKER=$EVENTHUBS_BROKER -e EVENTHUBS_TOPIC=
$EVENTHUBS_TOPIC -e EVENTHUBS_CONNECTION_STRING=$EVENTHUBS_CONNECTION_STRING
ctrl+c to stop producing events.
You can use the Azure Cosmos DB data explorer (web interface) to check the items in the container. You should see results similar to this:
Assuming you placed all the services in the same resource group, you can delete them using a single command:
export RESOURCE_GROUP_NAME=<enter the name>
az group delete --name $RESOURCE_GROUP_NAME --no-wait
Thanks for reading!
Happy to get your feedback via Twitter or just drop a comment. Stay tuned for more!
Opinions expressed by DZone contributors are their own.