{{announcement.body}}
{{announcement.title}}

Build AWS Serverless Application for Refinitiv’s Research API

DZone 's Guide to

Build AWS Serverless Application for Refinitiv’s Research API

Using EDP Research API to provide the buy-side with their entitled sell-side research reports on a real-time basis.

Free Resource

The benefits of having or moving data sets to cloud solutions are various and serverless computing is one of them. Once we have the APIs and tools ready in the cloud, there’s no reason to limit the computation at your app or solution. In this article, we will explore the cloud-ready API from Refinitiv and combine it with AWS services to create a serverless application that ready to use.

We will be using EDP Research API which is a Refinitiv aggregate delivery system of providing the buy-side with their entitled sell-side research reports on a real-time basis. This system delivers asynchronous updates (alerts) via Amazon’s Simple Queue Service (SQS). We will create a serverless application based on AWS services to receive and process messages from the queue without provisioning or managing servers. Amazon provides a set of services that can be used to create a serverless application.

You may also like: Going Serverless With Amazon Web Services (AWS) — The Traditional Approach

About the Research API

Refinitiv’s Research API is built on the Elektron Data Platform (EDP), which allows you to easily discover, integrate, analyze, enrich, and consume the content you need through a single, consistent interface.

The EDP API enables streamlined access to real-time research, as well as customized historical research extracts.

Key Features

  • One API access point to all entitled contributor content for programmatic access.
  • EDP API supports Linkback research distribution and will provide URLs to research instead of documents for Linkback enabled contributors.
  • A set of pre-approval contributors is available for API access requests, with more contributors being added over time.
  • Standardized schema and meta-data formats across >1000 Contributors.
  • Enriched contributor-provided meta-data with Intelligent tagging NLP (Natural Language Processing) meta-data.
  • Research Alerts — for new incoming and updated contributor content.
  • Request-response is available for programmatic polls and downloads of available entitled contributor content.
  • You and the contributor control research entitlements.

EDP Research API Overview

The EDP Research API uses an Alerts mechanism to deliver updates. An application first needs to login to the Elektron Data Platform and get access token used in any requests to Research API. The application can use API to subscribe to Research documents. After that, new updates (alerts) will be put in an AWS SQS queue. It is the application’s responsibility to keep polling the queue to get new messages.

Amazon Web Services Overview

The application in this article utilizes various Amazon Web Services. Below are some descriptions and resources.

AWS Lambda Function

AWS Lambda let you run code without provisioning or managing server. AWS Lambda supports multiple languages through the use of runtimes. We use Python runtime to execute our application’s code in Python in this article. You can also use AWS Lambda to run your code in response to events, such as changes to data in an Amazon S3 bucket or an Amazon DynamoDB table.

According to Using Lambda with Amazon SQS, Amazon SQS can also be an Event source of AWS Lambda Function which invokes a Lambda Function with an event that contains queue message, however, the SQS created by Research API currently doesn’t support this functionality. Pn this article, we will implement a Lambda function to poll the SQS queue manually.

AWS System Manager Parameter Store (SSM Parameter Store)

AWS Systems Manager Parameter Store provides secure, hierarchical storage for configuration data management and secrets management. You can store data such as passwords, database strings, and license codes as parameter values. The value can be stored as plain text or encrypted data.

In this article, we use the SSM Parameter Store to store username, password, UUID and access token used by the application. SSM Parameter Store also stores the Last Modified Date of the parameter, so we can use this timestamp information to verify whether the Access Token of EDP is expired or not.

Amazon Simple Storage Service (Amazon S3)

Amazon S3 is an object storage service. It has the concept of “bucket” which is a container for objects stored in Amazon S3. Every object is contained in a bucket.

AWS Step Functions

The AWS Step Functions is a service that lets users coordinate multiple AWS services into the serverless workflow. Workflows are made up of a series of steps, with the output of one step acting as input into the next. It translates an application’s workflow to a state diagram which is easy to understand and monitor.

In this article, we use AWS Step Functions to integrate each step of Lambda functions following the Research API workflow such as get access token, subscribe for Research and poll SQS. Below is the state diagram generated by Step Functions for the application. During run-time, it displays the current execute steps and log status and results of each step.

Application’s Workflow

The application is implemented to repeatedly poll SQS for new updates. As Lambda was intended for small, simple functions, I need to separate each step of implementation to multiple Lambda functions and integrate them with AWS Step Functions.

1. Step Function invokes a Lambda function; “getEDPToken” to get EDP username, password from AWS Systems Manager Parameter Store (SSM Parameter Store), and then use the information to get EDP access token from Elektron Data API service. The Access token will be stored back in the Parameter Store. Below is the snippet code of the “getEDPToken” Lambda function to get parameters’ value from the SSM Parameter Store.

Python




xxxxxxxxxx
1
10


 
1
# Get parameters' values from SSM Parameter Store 
2
client = boto3.client('ssm')
3
response = client.get_parameters(
4
 Names=['EDPUsername','EDPPassword','EDPClientId'],
5
 WithDecryption=False
6
)
7
params = response['Parameters']
8
username = list(filter(lambda x : x['Name'] == 'EDPUsername', params))[0]['Value']
9
password = list(filter(lambda x : x['Name'] == 'EDPPassword', params))[0]['Value']
10
clientId = list(filter(lambda x : x['Name'] == 'EDPClientId', params))[0]['Value']



2. A Lambda function; “subscribeResearch” is invoked to subscribe to Research Alerts, and then pass the Encryption key and endpoint to the next function. The function will unsubscribe the remain subscription first to prevent duplicate subscription errors.

3. With the endpoint, a Lambda function; “getCloudCredential” will request Cloud Credential from EDP service.

4. Application repeatedly polls the SQS Queue to see whether there is a new Alert message, and then get the document ID of Research.

5. A Step Function; “getAlertMessage” verifies whether messages containing document ID are available in the queue. If available, the function will pass the list of document IDs to the next function to download documents. If not, a Wait X seconds state is invoked to wait for the next interval.

6. A Step Functions; “refreshToken” will be invoked before the download documents state to refresh token if the stored access token is expired download and store the file on Amazon S3, if available. The function; “downloadDocuments” will be invoked for each document ID to download data from EDP, and then store the file in AWS S3.

7. The Research API supports two types of results; text or pdf. The application in this article is implemented to get Research documents in text format. You can modify the type to pdf in the “downloadDocuments” Lambda function. Below is the sample code for pdf format.

Python




xxxxxxxxxx
1
11


 
1
#===================================================================
2
def downloadDocument(id,docUrl,outputBucket):
3
 s3 = boto3.client('s3') 
4
 response = requests.get(docUrl, stream=True)
5
 print(response.raw)
6
 s3.upload_fileobj(response.raw, outputBucket, id+".pdf")
7
#===================================================================
8
def getDocumentUrl(token,docID,uId):
9
 document_type = "/pdf"
10
 p = {'uuid': uId}
11
 RESOURCE_ENDPOINT = document_URL + docID + document_type



Below is the connectivity diagram describing how the application integrates with other Amazon Web Services and Elektron Data API.

Connectivity diagram
Connectivity diagram

Environment Setup

1. To use Lambda and other AWS services, you need an AWS account and IAM User first. Below is the information regarding the setup from the Get Started with Lambda page. Please follow the instructions if you do not have an AWS account.

To use Lambda and other AWS services, you need an AWS account. If you don’t have an account, visit aws.amazon.com and choose Create an AWS Account. For detailed instructions, see Create and Activate an AWS Account.

As a best practice, you should also create an AWS Identity and Access Management (IAM) user with administrator permissions and use that for all work that does not require root credentials. Create a password for console access, and access keys to use command-line tools. See Creating Your First IAM Admin User and Group in the IAM User Guide for instructions.

2. Download and install AWS Command Line Interface (CLI) which is used to deploy Lambda Functions in this article.

3. Setup your AWS credential in the AWS CLI. Firstly, you need to get your access key ID and secret access key. You can follow the instructions in this guide to get your credential information. Run the configure command to set region, access key ID and secret access key.

Concerning the default region, all resources are created in “us-east-1” because the region is used by Research API to create the SQS queue. To prevent data transfer costs, we use this region for all Amazon Web Services.

Plain Text




xxxxxxxxxx
1


 
1
>aws configure
2
AWS Access Key ID [None]: accesskey
3
AWS Secret Access Key [None]: secretkey
4
Default region name [None]: us-east-1
5
Default output format [None]:



4. The application gets EDP Username, Password, Access Token, and Refresh Token from SSM Parameter Store, so you need to create the following parameters on the AWS Console.

  • EDPUsername
  • EDPPassword
  • EDPClientId
  • UUID
  • EDPAccessToken
  • EDPRefreshToken
  • BucketStorage

The console screen is below once you create all parameters.

Console Screen
Console Screen

5. The application download Research document information, and then store the information as a file object in an AWS S3 bucket. You need to create a bucket on the AWS Console.

Create Lambda Functions

Next, we will create Lambda Functions from deployment packages using the AWS Command Line Interface (CLI). The deployment packages and installation scripts can be downloaded from Github. After extract the file, you will see the structure as follows.

Deployment Package files
Deployment Package file

The “install.ps1” is a PowerShell script that can be used for installation. You can run the script to setup all Lambda functions. Otherwise, please follow the instructions as follows.
First, open a command line and change the current directory to the extracted files’ location.

1. IAM Role for AWS service access
IAM Role can be created on a user account to define permission policies. As you have already known, the application accesses various services. We need to create an IAM Role contain permission policy for the services.

  • Create an IAM Role

This step will return created Role’s ARN. You will need the Role’s ARN to create Lambda Function in the next step.

Below is the sample of ARN in the returned message.

Sample ARN

Sample ARN

  • Attach Role Policy
Plain Text




xxxxxxxxxx
1


 
1
aws iam attach-role-policy - policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess - role-name lambda-sqs-ssm 
2
aws iam attach-role-policy - policy-arn arn:aws:iam::aws:policy/AmazonSSMFullAccess - role-name lambda-sqs-ssm 
3
aws iam attach-role-policy - policy-arn arn:aws:iam::aws:policy/AWSLambdaExecute - role-name lambda-sqs-ssm



2. Create Lambda Functions
You need to replace the $arn_info with the ARN of the IAM Role created in the previous step.

Plain Text




xxxxxxxxxx
1


 
1
aws lambda create-function - function-name getEDPToken - runtime python3.7 - role $arn_info - handler lambda_function.lambda_handler - timeout 20 zip-file fileb://getEDPToken.zip - region us-east-1
2
aws lambda create-function - function-name subscribeResearch - runtime python3.7 - role $arn_info - handler lambda_function.lambda_handler - timeout 20 zip-file fileb://subscribeResearch.zip - region us-east-1
3
aws lambda create-function - function-name getCloudCredential - runtime python3.7 - role $arn_info - handler lambda_function.lambda_handler - timeout 20 zip-file fileb://getCloudCredential.zip - region us-east-1
4
aws lambda create-function - function-name getAlertMessage - runtime python3.7 - role $arn_info - handler lambda_function.lambda_handler - timeout 10 - zip-file fileb://getAlertMessage.zip - region us-east-1
5
aws lambda create-function - function-name refreshToken - runtime python3.7 - role $arn_info - handler lambda_function.lambda_handler - timeout 5 - zip-file fileb://refreshToken.zip - region us-east-1
6
aws lambda create-function - function-name downloadDocuments - runtime python3.7 - role $arn_info - handler lambda_function.lambda_handler - timeout 10 - zip-file fileb://downloadDocuments.zip - region us-east-1



After this step, you will see the list of Lambda functions in the AWS Console GUI.

List of Lambda functions
List of Lambda functions

Custom Python Library in Lambda Function

Lambda Function generally is executed in a dedicated environment. If your function depends on libraries other than the SDK for Python (Boto 3), you need to create a deployment package that includes the libraries. In this article, the Lambda Functions depend on the requests and the pycryptodome libraries for REST API and decryption. For more information, please refer to Updating a Function with Additional Dependencies section in AWS Lambda Deployment Package in Python.

At this step, you should be able to create Lambda Functions, parameters in SSM Parameter Store and AWS S3 bucket used by the application. Next, we will describe basic information about how to implement and create a serverless application that coordinates all functions and other services with Step Functions.

Step Functions Implementation Overview

Step Functions are based on the concepts of tasks and state machines. You define state machines using the JSON-based Amazon States Language. States can perform a variety of functions in your state machine:

  • Do some work in your state machine (a Task state).
  • Choose between branches of execution (a Choice state).
  • Stop execution with failure or success (a Fail or Succeed state).
  • Simply pass its input to its output or inject some fixed data (a Pass state).
  • Provide a delay for a certain amount of time or until a specified time/date (a Wait state).
  • Begin parallel branches of execution (a Parallel state).
  • Dynamically iterate steps using a Map state.

The application in this article utilizes some functions of States. The following diagram shows the defined types of each state.

Defined state types
Defined state types
  • Almost states are Task state which executes Lambda Functions to request EDP access token (getEDPToken), subscribe Research Alert (subscribeResearch), poll SQS queue (getAlertMessage), download documents (downloadDocuments) etc.
  • The Check Status uses the Choice state to choose to invoke download document function when a new message is received from SQS.
  • The Download Documents function is defined as Map state which iterates through a map of the document ID and invokes a Lambda function to download/upload a document for each ID in parallel. This should improve the performance of the application when multiple Alert messages are received from the queue.
  • The Wait X Seconds is defined as Wait state which delays the state machine from continuing for a specified time. This is to wait for a specific interval before polling the queue for a new message. For this application, the interval time is 10 seconds.
  • The Queue Failed is defined as a Fail state to stop execution when the application is failed to receive a message from SQS.

Create Step Functions

Please following the following instructions to create a Step Function coordinate the Lambda Functions created in the previous step.

  • Open the AWS Step Functions Console -> click the “Create state machine” button.
  • In Step1: Define state machine, select “Author with code snippets” and fill Name of Steps Function.
  • Copy the following codes on the State machine definition. You need to correct the ARN of the Lambda Function in the “Resource” fields of the code.
Plain Text




xxxxxxxxxx
1
98


 
1
{
2
 "Comment": "An example of the Amazon States Language that integrates with Refinitiv EDP Research API",
3
"StartAt": "Get EDP Token",
4
 "States": {
5
 "Get EDP Token": {
6
 "Type": "Task",
7
 "Resource": "arn:aws:lambda:us-east-1:<user>:function:getEDPToken",
8
 "Next": "Subscribe Research"
9
 },
10
 "Subscribe Research": {
11
 "Type": "Task",
12
 "Resource": "arn:aws:lambda:us-east-1:<user>:function:subscribeResearch",
13
 "ResultPath": "$.subscriptionInfo",
14
 "Next": "Get Cloud Credential"
15
 },
16
 "Get Cloud Credential": {
17
 "Type": "Task",
18
 "Resource": "arn:aws:lambda:us-east-1:<user>:function:getCloudCredential",
19
 "InputPath": "$.subscriptionInfo",
20
 "ResultPath": "$.cloudCredentialInfo",
21
 "Next": "Wait X Seconds"
22
 },
23
 "Wait X Seconds": {
24
 "Type": "Wait",
25
 "Seconds": 10,
26
 "Next": "Get Alert Message"
27
 },
28
 "Get Alert Message": {
29
 "Type": "Task",
30
 "Resource": "arn:aws:lambda:us-east-1:<user>:function:getAlertMessage",
31
 "Next": "Check Status",
32
 "Parameters": {
33
 "endpoint.$": "$.subscriptionInfo.endpoint",
34
 "cryptographyKey.$":"$.subscriptionInfo.cryptographyKey",
35
 "accessKeyId.$": "$.cloudCredentialInfo.accessKeyId",
36
 "secretKey.$": "$.cloudCredentialInfo.secretKey",
37
 "sessionToken.$": "$.cloudCredentialInfo.sessionToken"
38
 },
39
 "ResultPath": "$.queueStatus"
40
 },
41
 "Check Status": {
42
 "Type": "Choice",
43
 "Choices": [
44
 {
45
 "Variable": "$.queueStatus['status']",
46
 "StringEquals": "Doc Available",
47
 "Next": "Refresh Token"
48
 },
49
 {
50
 "Variable": "$.queueStatus['status']",
51
 "StringEquals": "Queue Failed",
52
 "Next": "Queue Failed"
53
 },
54
 {
55
 "Variable": "$.queueStatus['status']",
56
 "StringEquals": "ExpiredToken",
57
 "Next": "Get Cloud Credential"
58
 },
59
 {
60
 "Variable": "$.queueStatus['status']",
61
 "StringEquals": "None",
62
 "Next": "Wait X Seconds"
63
 }
64
 ],
65
 "Default": "Wait X Seconds"
66
 },
67
 "Queue Failed": {
68
 "Type": "Fail",
69
 "Cause": "SQS returned error",
70
 "Error": "SQS FAILED"
71
 },
72
 "Download Documents": {
73
 "Type": "Map",
74
 "InputPath": "$.queueStatus",
75
 "ItemsPath": "$.docIds",
76
 "MaxConcurrency": 10,
77
 "ResultPath": "$.status",
78
 "Iterator": {
79
 "StartAt": "Download Document",
80
 "States": {
81
 "Download Document": {
82
 "Type": "Task",
83
 "Resource": "arn:aws:lambda:us-east-1:<user>:function:downloadDocuments",
84
 "End": true
85
 }
86
 }
87
 },
88
 "ResultPath": "$.status",
89
 "Next": "Wait X Seconds"
90
 },
91
 "Refresh Token": {
92
 "Type": "Task",
93
 "Resource": "arn:aws:lambda:us-east-1:<user>:function:refreshToken",
94
 "Next": "Download Documents",
95
 "ResultPath": "$.status"
96
 }
97
 }
98
}



  • Click the “Next” button to next step,
  • In Step2: Configure settings, select “Create an IAM role for me”, and then fill an IAM role name.
  • Finally, click “Create state machine.”
Create State machine
Create State machine
The code will generate a similar workflow as follows.
Generated Workflow
Generated Workflow

Steps to Run the Serverless Application

1. Configure information in the following parameters under the SSM parameter store. If you do not have the EDP Username, Password, and UUID, please contact your local Refinitiv representative. Regarding the Client ID, please follow the steps in this tutorial to create Client ID (App Key). BucketStorage is the name of the AWS S3 bucket created in the setup steps.

  • EDPUsername
  • EDPPassword
  • EDPClientId
  • UUID
  • BucketStorage

2. Open the Step Functions Console, select the state machine you have created in the setup steps.

3. Select the “Start Execution” button to start the application. You will see a new pop-up window for filling comments. Just click the “Start Execution” button again. You will see the application is running through each step of Lambda function and status on the Visual Workflow panel.

Visual Workflow
Visual Workflow

4. Once a new Alert is available, the Research information will be stored in the S3 bucket defined in the BucketStorage parameter.

BucketStorage
BucketStorage

Finally, new research documents will be uploaded continually as long as the application is executing. You can manually download/open the file via Amazon S3 Bucket Console or integrate Amazon S3 with other AWS as needed. For example, trigger a new Lambda Function.

Diagnostic and Troubleshooting

In the Console, Steps Function provides Execution Event History which can help you get more information when it fails.

For example:
1) In TaskStateEntered event type, the event will display input data when it executed a function.

Task State Entered Input
Task State Entered Input

2) The Event also can link to CloudWatch logs. The Log displays all logger messages for specific Lambda Function. You can add any debug code or verify the sequence of events here.

CloudWatch Logs
CloudWatch Logs
CloudWatch Logs
CloudWatch Logs

3) If the Step Functions is failed with the following message: “ The cause could not be determined because Lambda did not return an error type.”, it likely is because of timeout in the Lambda Function. This indicates that the function cannot be completed with the defined timeout. You may need to extend the timeout configuration defined for the function.

Conclusion

In this article, we describe basic information about Refinitiv’s Research API and AWS. We demonstrate how to create a serverless application which continually retrieves EDP Research information and store it in Cloud Storage.

Further Reading

Introduction to Serverless

Serverless Applications With AWS Lambda: 5 Use Cases

API Development Using AWS Serverless Architecture

Topics:
aws lambda ,aws step functions ,api ,cloud ,research api ,serverless

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}