DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • How to Configure AWS Glue Job Using Python-Based AWS CDK
  • Building Scalable Data Lake Using AWS
  • Building a Scalable ML Pipeline and API in AWS
  • Breaking AWS Lambda: Chaos Engineering for Serverless Devs

Trending

  • How to Format Articles for DZone
  • Strategies for Securing E-Commerce Applications
  • Simplifying Multi-LLM Integration With KubeMQ
  • Endpoint Security Controls: Designing a Secure Endpoint Architecture, Part 2
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. Build a Serverless Application for Entity Detection on AWS

Build a Serverless Application for Entity Detection on AWS

In this article, learn how to build a Serverless solution for entity detection using Amazon Comprehend, AWS Lambda, and the Go programming language.

By 
Abhishek Gupta user avatar
Abhishek Gupta
DZone Core CORE ·
Oct. 02, 23 · Tutorial
Likes (4)
Comment
Save
Tweet
Share
10.1K Views

Join the DZone community and get the full member experience.

Join For Free

In this blog post, you will learn how to build a Serverless solution for entity detection using Amazon Comprehend, AWS Lambda, and the Go programming language.

Text files uploaded to Amazon Simple Storage Service (S3) will trigger a Lambda function which will further analyze it, extract entity metadata (name, type, etc.) using the AWS Go SDK, and persist it to an Amazon DynamoDB table. You will use Go bindings for AWS CDK to implement "Infrastructure-as-code" for the entire solution and deploy it with the AWS Cloud Development Kit (CDK) CLI.

The code is available on GitHub.

Amazon Comprehend

Introduction

Amazon Comprehend leverages NLP to extract insights from documents, including entities, key phrases, language, sentiments, and other elements. It utilizes a pre-trained model that is continuously updated with a large body of text, eliminating the need for training data. Additionally, users can build their own custom models for classification and entity recognition with the help of Flywheels. The platform also offers built-in topic modeling to organize documents based on similar keywords. For document processing, there is the synchronous mode for a single document or a batch of up to 25 documents, while asynchronous jobs are recommended for processing large numbers of documents.

Let's learn Amazon Comprehend with a hands-on tutorial. We will be making use of the entity detection feature wherein, Comprehend analyzes the text and identifies all the entities present, as well as their corresponding entity type (e.g. person, organization, location). Comprehend can also identify relationships between entities, such as identifying that a particular person works for a specific company. Automatically identifying entities within large amounts of text data can help businesses save time and resources that would otherwise be spent manually analyzing and categorizing text data.

Prerequisites

Before you proceed, make sure you have the following installed:

  • Go programming language (v1.18 or higher)
  • AWS CDK
  • AWS CLI

Clone the project and change to the right directory:

git clone https://github.com/abhirockzz/ai-ml-golang-comprehend-entity-detection

cd ai-ml-golang-comprehend-entity-detection


Use AWS CDK To Deploy the Solution

The AWS Cloud Development Kit (AWS CDK) is a framework that lets you define your cloud infrastructure as code in one of its supported programming and provision it through AWS CloudFormation.

To start the deployment, simply invoke cdk deploy and wait for a bit. You will see a list of resources that will be created and will need to provide your confirmation to proceed.

cd cdk

cdk deploy

# output

Bundling asset ComprehendEntityDetectionGolangStack/comprehend-entity-detection-function/Code/Stage...

✨  Synthesis time: 4.32

//.... omitted

Do you wish to deploy these changes (y/n)? y


Enter y to start creating the AWS resources required for the application.

If you want to see the AWS CloudFormation template which will be used behind the scenes, run cdk synth and check the cdk.out folder.

You can keep track of the stack creation progress in the terminal or navigate to the AWS console: CloudFormation > Stacks > ComprehendEntityDetectionGolangStack.

Once the stack creation is complete, you should have:

  • An S3 bucket - Source bucket to upload text file
  • A Lambda function to execute entity detection on the file contents using Amazon Comprehend
  • A DyanmoDB table to store the entity detection result for each file
  • A few other components (like IAM roles, etc.)

You will also see the following output in the terminal (resource names will differ in your case). In this case, these are the names of the S3 bucket and the DynamoDB table created by CDK:

 ✅  ComprehendEntityDetectionGolangStack

✨  Deployment time: 139.02s

Outputs:
ComprehendEntityDetectionGolangStack.entityoutputtablename = comprehendentitydetection-textinputbucket293fcab7-8suwpesuz1oc_entity_output
ComprehendEntityDetectionGolangStack.textfileinputbucketname = comprehendentitydetection-textinputbucket293fcab7-8suwpesuz1oc
.....


You can now try out the end-to-end solution!

Detect Entities in Text File

To try the solution, you can either use a text file of your own or the sample files provided in the GitHub repository.

I will be using the S3 CLI to upload the file, but you can use the AWS console as well.

export SOURCE_BUCKET=<enter source S3 bucket name - check the CDK output>

aws s3 cp ./file_1.txt s3://$SOURCE_BUCKET
aws s3 cp ./file_2.txt s3://$SOURCE_BUCKET

# verify that the file was uploaded
aws s3 ls s3://$SOURCE_BUCKET


This Lambda function will extract detect entities and store the result (entity name, type, and confidence score) in a DynamoDB table.

Check the DynamoDB table in the AWS console:

Check the DynamoDB table in the AWS console

You can also use the CLI to scan the table:

aws dynamodb scan --table-name <enter table name - check the CDK output>


Don’t Forget To Clean Up

Once you're done, to delete all the services, simply use:

cdk destroy

#output prompt (choose 'y' to continue)

Are you sure you want to delete: ComprehendEntityDetectionGolangStack (y/n)?


You were able to set up and try the complete solution. Before we wrap up, let's quickly walk through some of the important parts of the code to get a better understanding of what's going on behind the scenes.

Code Walkthrough

We will only focus on the important parts - some code has been omitted for brevity.

CDK

You can refer to the complete CDK code here.

bucket := awss3.NewBucket(stack, jsii.String("text-input-bucket"), &awss3.BucketProps{
        BlockPublicAccess: awss3.BlockPublicAccess_BLOCK_ALL(),
        RemovalPolicy:     awscdk.RemovalPolicy_DESTROY,
        AutoDeleteObjects: jsii.Bool(true),
})


We start by creating the source S3 bucket.

    table := awsdynamodb.NewTable(stack, jsii.String("entites-output-table"),
        &awsdynamodb.TableProps{
            PartitionKey: &awsdynamodb.Attribute{
                Name: jsii.String("entity_type"),
                Type: awsdynamodb.AttributeType_STRING},

            TableName: jsii.String(*bucket.BucketName() + "_entity_output"),

            SortKey: &awsdynamodb.Attribute{
                Name: jsii.String("entity_name"),
                Type: awsdynamodb.AttributeType_STRING},
        })


Then, we create a DynamoDB table to store entity detection results for each file.

    function := awscdklambdagoalpha.NewGoFunction(stack, jsii.String("comprehend-entity-detection-function"),
        &awscdklambdagoalpha.GoFunctionProps{
            Runtime:     awslambda.Runtime_GO_1_X(),
            Environment: &map[string]*string{"TABLE_NAME": table.TableName()},
            Entry:       jsii.String(functionDir),
        })

    table.GrantWriteData(function)
    bucket.GrantRead(function, "*")
    function.Role().AddManagedPolicy(awsiam.ManagedPolicy_FromAwsManagedPolicyName(jsii.String("ComprehendReadOnly")))


Next, we create the Lambda function, passing the DynamoDB table name as an environment variable to the function. We also grant the function access to the DynamoDB table and the S3 bucket. We also grant the function access to the ComprehendReadOnly managed policy.

function.AddEventSource(awslambdaeventsources.NewS3EventSource(sourceBucket, &awslambdaeventsources.S3EventSourceProps{
        Events: &[]awss3.EventType{awss3.EventType_OBJECT_CREATED},
    }))


We add an event source to the Lambda function to trigger it when an invoice image is uploaded to the source bucket.

    awscdk.NewCfnOutput(stack, jsii.String("text-file-input-bucket-name"),
        &awscdk.CfnOutputProps{
            ExportName: jsii.String("text-file-input-bucket-name"),
            Value:      bucket.BucketName()})

    awscdk.NewCfnOutput(stack, jsii.String("entity-output-table-name"),
        &awscdk.CfnOutputProps{
            ExportName: jsii.String("entity-output-table-name"),
            Value:      table.TableName()})


Finally, we export the S3 bucket and DynamoDB table names as CloudFormation output.

Lambda Function

You can refer to the complete Lambda Function code here.

func handler(ctx context.Context, s3Event events.S3Event) {
    for _, record := range s3Event.Records {

        sourceBucketName := record.S3.Bucket.Name
        fileName := record.S3.Object.Key

        err := detectEntities(sourceBucketName, fileName)
    }
}


The Lambda function is triggered when a text file is uploaded to the source bucket. For each text file, the function extracts the text and invokes the detectEntities function.

Let's go through it.

func detectEntities(sourceBucketName, fileName string) error {

    result, err := s3Client.GetObject(context.Background(), &s3.GetObjectInput{
        Bucket: aws.String(sourceBucketName),
        Key:    aws.String(fileName),
    })

    buffer := new(bytes.Buffer)
    buffer.ReadFrom(result.Body)
    text := buffer.String()

    resp, err := comprehendClient.DetectEntities(context.Background(), &comprehend.DetectEntitiesInput{
        Text:         aws.String(text),
        LanguageCode: types.LanguageCodeEn,
    })

    for _, entity := range resp.Entities {

        item := make(map[string]ddbTypes.AttributeValue)

        item["entity_type"] = &ddbTypes.AttributeValueMemberS{Value: fmt.Sprintf("%s#%v", fileName, entity.Type)}
        item["entity_name"] = &ddbTypes.AttributeValueMemberS{Value: *entity.Text}
        item["confidence_score"] = &ddbTypes.AttributeValueMemberS{Value: fmt.Sprintf("%v", *entity.Score)}

        _, err := dynamodbClient.PutItem(context.Background(), &dynamodb.PutItemInput{
            TableName: aws.String(table),
            Item:      item,
        })
    }

    return nil
}


  • The detectEntities function first reads the text file from the source bucket.
  • It then invokes the DetectEntities API of the Amazon Comprehend service.
  • The response contains the detected entities. The function then stores the entity type, name, and confidence score in the DynamoDB table.

Conclusion and Next Steps

In this post, you saw how to create a serverless solution using Amazon Comprehend. The entire infrastructure life-cycle was automated using AWS CDK. All this was done using the Go programming language, which is well-supported in AWS Lambda and AWS CDK.

Here are a few things you can try out to extend this solution:

  • Try experimenting with other Comprehend features such as Detecting PII entities.
  • The entity detection used a pre-trained model. You can also train a custom model using the Comprehend Custom Entity Recognition feature that allows you to use images, scanned files, etc. as inputs (rather than just text files).

Happy building!

AWS AWS Lambda CDK (programming library) entity Go (programming language)

Published at DZone with permission of Abhishek Gupta, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • How to Configure AWS Glue Job Using Python-Based AWS CDK
  • Building Scalable Data Lake Using AWS
  • Building a Scalable ML Pipeline and API in AWS
  • Breaking AWS Lambda: Chaos Engineering for Serverless Devs

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!