Storing Tweets With AWS Lambda and Scheduled Events

DZone 's Guide to

Storing Tweets With AWS Lambda and Scheduled Events

See how Scheduled Events, the Serverless Application Model, and AWS Lambda can be used to continually funnel new tweets to Couchbase.

· Cloud Zone ·
Free Resource

This blog series has explained a few serverless concepts with code samples:

This particular blog entry will show how to use AWS Lambda to store tweets in Couchbase. Here are the high-level components:



The key concepts are:

Complete sample code for this blog is available at github.com/arun-gupta/twitter-n1ql.

Serverless Application Model

The Serverless Application Model, or SAM, defines a simplified syntax for expressing serverless resources. SAM extends AWS CloudFormation to add support for API Gateway, AWS Lambda, and Amazon DynamoDB. Read more details in Microservice using AWS Serverless Application Model and Couchbase.

For our application, the SAM template is available at github.com/arun-gupta/twitter-n1ql/blob/master/template-example.yml and shown below:

AWSTemplateFormatVersion : '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Twitter Feed Analysis using Couchbase/N1QL
    Type: AWS::Serverless::Function
      Handler: org.sample.twitter.TwitterRequestHandler
      Runtime: java8
      CodeUri: s3://arungupta.me/twitter-feed-1.0-SNAPSHOT.jar
      Timeout: 30
      MemorySize: 1024
          COUCHBASE_HOST: <value>
      Role: arn:aws:iam::598307997273:role/microserviceRole
          Type: Schedule
            Schedule: rate(3 hours)

What do we see here?

public class TwitterRequestHandler implements RequestHandler<Request, String> {

    public String handleRequest(Request request, Context context) {
        if (request.getName() == null)
        int tweets = new TwitterFeed().readFeed(request.getName());
        return "Updated " + tweets + " tweets for " + request.getName() + "!";

    By default, this class reads the Twitter handle of  Donald Trump. More fun on that is coming in a subsequent blog.
  • COUCHBASE_HOST and COUCHBASE_BUCKET_PASSWORD are environment variables that provide EC2 host where Couchbase database is running and the password of the bucket.
  • The function can be triggered by different events. In our case, this is triggered every three hours. See more details about the expression used here at Schedule Expressions Using Rate or Cron.

Fetching Tweets using Twitter4J

Tweets are read using the Twitter4J API. It is an unofficial Twitter API that provides a Java abstraction over the Twitter REST API. Here is a simple example:

Twitter twitter = getTwitter();
Paging paging = new Paging(page, count, sinceId);
List<Status> list = twitter.getUserTimeline(user, paging);

Fortunately, the Twitter4J Docs and Javadocs are pretty comprehensive.

The Twitter API allows you to read only the last 200 tweets. The lambda function is invoked every three hours. That works because the tweet frequency of @realDonaldTrump is not 200 every three hours (at least not yet). If it does reach that dangerous level, then we can adjust the rate to trigger the lambda function more frequently.

A JSON representation of each tweet is stored in Couchbase Server using the Couchbase Java SDK. AWS Lambda supports Node, Python, and C#, so you can use the Couchbase Node SDKCouchbase Python SDK, or Couchbase .NET SDK to write these functions as well.

The Twitter4J API allows us to fetch tweets via the id of a particular tweet. This ensures that duplicate tweets are not fetched. This requires us to sort all tweets in a particular order and then pick the id of the most recent tweet. This was solved using the simple N1QL query:


The syntax is very SQL-like. More on this in a subsequent blog.

Store Tweets in Couchbase

The final item is to store the retrieved tweets in Couchbase.

The value of the COUCHABSE_HOST environment variable is used to connect to the Couchbase instance. The value of the COUCHBASE_BUCKET_PASSWORD environment variable is to connect to the secure bucket where all JSON documents are stored. It's critical that the bucket be password-protected and not directly specified in the source code. More on this in a subsequent blog.

The JSON document is upserted (insert or update) in Couchbase using the Couchbase Java API:


This lambda function has been running for a few days now and has captured 258 tweets from @realDonaldTrump.


An interesting analysis of his tweets is coming shortly!

Talk to us:

The complete sample code for this blog is available at github.com/arun-gupta/twitter-n1ql.

aws lambda, cloud, serverless application model, tutorial, twitter api

Published at DZone with permission of Arun Gupta , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}