{{announcement.body}}
{{announcement.title}}

Running Java on AWS Infrastructure: How To Put the Bricks Together

DZone 's Guide to

Running Java on AWS Infrastructure: How To Put the Bricks Together

Create and maintain applications that are ''alive'' by integrating and optimizing your current infrastructure to run on AWS.

· Cloud Zone ·
Free Resource

Startups tend to grow and expand, and successful startups tend to propagate this progress. I see this variety of possible ways mainly from the engineer's perspective, from the point of striving to create technically reliable, automated, simple to use, and supportive applications, which can be called "alive." I am calling them "alive" here because I see those apps as living beings with their own lifecycles. And now let me elaborate on how this quite abstract idea can be applied to the real world. 

I became a member of my current team at the moment when one crucial question had to be addressed. The question was "How are we going to make our product easier to develop and use?" Originally, the company agreed to deliver integration with a large third-party system, but there was one obstacle: it is hard or even impossible to integrate the system into the larger one (and I am talking here about medical, financial or similar areas) when the product is a mere web app, without a backend (AWS Lamdas are not included here) which should provide an easy way to scale and customize the app, without CI/CD flow, without profound data store (which DynamoDB is not) to effectively aggregate and analyze data (which is becoming a crucial requirement) and, moreover, without a clear and easy way to administer the application itself (and I mean here a customer's onboarding mostly). Considering the fact that this integration is extremely determinative in terms of clients' and the development company's success, all these impediments were defined as a "dragon" that should be defeated to make our product "alive." 

And now, I am going to tell you how a version 2.0 of the product was created from a perspective of backend tech stack and infrastructure.

Application

Today's software web development world is moving towards a responsive, resilient, elastic, message-driven software architecture and I tend to see it as a blessing. All four noble keys components should be familiar to you (and I hope they are) from this wonderful paper which states the following:

"We believe that a coherent approach to systems architecture is needed, and we believe that all necessary aspects are already recognised individually: we want systems that are Responsive, Resilient, Elastic and Message-Driven. We call these Reactive Systems."

So, eventually, for me, as for a diver from the Java world, it became clear that we need to move towards the asynchronous stream processing with non-blocking backpressure which is fully described in the manifest. And the most important thing here is that I see this step as a common sense decision through the perspective of our community's experience. Moreover, this approach is not brand new, and it could be and should be perceived as the only way to develop modern web apps. 

The first thing to do was to choose which Reactive Stream implementation we were going to use. As I had experience with Spring Boot and Spring Framework itself, I was particularly interested to try its new version 2.0.x which is combined with Project Reactor (the implementation which we were looking for actually). I had some doubts on that regarding which certain Reactive Stream implementation I should use from the possible number of tools: RxJava, Akka Streams and some others. I Googled some information and found this article by David Karnok who really made a profound dig into the classification of the existing Reactive Stream implementations. Briefly, I would say that he separated them into 6 classes (0th, 1st, 2nd, 3rd, 4th, 5+) with the corresponding descriptions where Project Reactor belongs to the 4th-generation of reactive libraries and RxJava 2.x and Akka-Streams, on the other hand, belong to the 3rd one. The numbers here do make sense for me. So, eventually, we can say that our backend stack to create application itself would be:

  • Java 8
  • Spring Boot 2.0.x
  • Project Reactor
  • Netty under the hood
  • Maven

Database

If we will take a look at today's data stores' tendencies and compare them to those which were happening a few years ago, I am sure that one crucial moment should be noticed. Tools for data store are moving towards not only effective data keeping itself but also providing a comprehensive way to analyze all gathered data, providing instruments to make different kinds of statistics and calculations. In addition, I would add that they should also be on the edge of all foregoing changes in software development itself. Considering our case, we are talking about a tool which will support the Reactive Stream specification to work with the data.

I will be honest with you: I am a total fan of NoSQL databases. They provide an easy way to extend and modify your data domain mutually allowing you to keep your model's declaration only in one place — in the code itself. It is crucial to keep your application data consistent and coherent and I see here only one possible solution to achieve that — your code with implemented logic should provide all guarantees for such purposes. Comparing to popular SQL databases such as MySQL with their strict schemes' definitions, triggers, and functions, I tend to recognize all that stuff should be a part of an application which interacts with a database. I recognize database as another application and I do not see it reasonable to delegate such an important part of domain logic to the store and make it separate from the main application itself.

As I was playing around for a while with MongoDB and had some experience with creating data analytics base on aggregation framework (which is designed on the concept of data processing pipelines: documents enter a multi-stage pipeline that transforms the documents into an aggregated result.) it was quite straightforward for me to dig into this direction.

Infrastructure

Here we are. This part actually is the most important section of the article as all other ones are just bricks, which will be put along here to make them work together.

The very first thing to start with here is technical requirements which our application ought to meet. Here they are:

  • As users' data contains sensitive information from third party sources, it should be stored behind a network with restricted access.
  • The system should be easy to develop. It subsequently will make a delivery not a drama for the team. 
  • As an integration with third parties is a cornerstone of the product, so we ought to have a coherent and painless approach to extend the system.

I would like to mention, that I found those requirements pretty common among plenty of web apps. Hence, I think that further information would be useful to make it clear for engineers on how to build simple systems from end-to-end. Writing the code is a very important part, but how to maintain and run the code is not less crucial part of any system, and it makes a decisive impact on the product's quality itself.

Now we can begin to design. As a sandbox cloud platform, so to speak, we will use AWS. And here is a scheme itself:


Image title


Let me add some clarification. I am sure that most of us here have heard about either "microservices" or "microservice architecture". And as no one is better than Martin Fowler to explain any fundamental theories in software development, here is his article about the mentioned technique. Do not be shy, go ahead and refresh your knowledge even if you think that you know what microservice architecture is. Considering our case, in my opinion, a key point here that I want to underline is "decentralization": keep an application's functionality separated as much as possible and try to not overwhelm your solution with an enormous number of tools or provided services.

When a system has some kind of a core functionality which should be integrated with other systems, it is obviously a beneficial way to build the system as a two separate services which have, so to speak, mono-directional dependency between them: 3rd party integration service should not impact the core service and any code changes to any service should not degrade the stability of others. The last thing is also true and critical to follow for other infrastructure's components: load balancer, DB servers, and others.

According to those prescriptions, we have two application servers for our services which are located in their own private network which forbids public access. The database also has its own server with a separate private network. NAT Gateway and Internet Gateway are located in the public networks to have the opportunity to communicate with outward via static attached inbound/outbound elastic IPs. All together these things are located within an on-demand configurable pool of shared computing resources allocated within an AWS private cloud environment calling VPC (Virtual Private Cloud). Moreover, every infrastructure unit has its own security group which regulates inbound and outbound network flows based on IPs and ports range (for example database can be accessible only from Private App Subnet or using direct VPN connection). Those groups can be seen as a firewall to control traffic at the instance level. All along these AWS abstractions are called to help efficiently built your applications in the context of cloud infrastructure as a service.

Oh, I also forgot to add more clarity about how we are going to communicate with the Internet itself. For such purpose, as a single outbound point, AWS provides NAT Gateway (NAT stands for network address translation), a managed service which enables instances in a private subnet to connect to the internet via the service with attached static public IP. If we are talking about an inbound-outward connection with those instances we should consider our load balancer. As our two services are going to have their own REST API which should be accessible from the public, all traffic is going to be processed through the LB (with preconfigured routing rules and similar things). Also, as integration with 3rd party system requires implementation of TLS-MA (let's consider it as a bit more advanced security restriction) was decided to use HAProxy as it is one of the most lightweight and reliable solution offering high availability, load balancing, and proxying for TCP and HTTP-based applications. From network and communication perspective that is pretty much it.

So, we have a clear picture of what we are going to build and now, I suppose you think now is the best time to do so. I would agree with you, but I consider that one major thing is lacking: it would be extremely beneficial to figure out if we could somehow automate infrastructure's creation itself before we start to implement it. The main goal behind this automation is to make it easy to transfer the solution from sandbox to production environment and decrease human factor. 

Being curious and Googling related questions I found Terraform, a tool which enables you to safely and predictably create, change, and improve infrastructure. It is an open source tool that codifies APIs into declarative configuration files that can be shared amongst team members (using any VCS you prefer), treated as code, edited, reviewed, and versioned. Moreover, as you can conclude, it stands for infrastructure-as-code approach which removes the risk associated with human error and makes your code into a single source of truth in terms of documentation. Generally, the process itself can be described as a mere three stages:

  • Write code (describe your infrastructure)
  • Plan (Terraform will compare a state of the current running infrastructure to your changes)
  • Apply (if changes were detected — apply and deploy them)

It is an obvious fact that such an approach will increase reliability and make the product more convenient and mature in terms of DevOps, maintenance and delivery.

CI/CD

So, now it is time to talk about our CI/CD flow and the way in which it was implemented. Here is a scheme of how we designed and automated our build and deploy process:


Image title


I assume that nothing here is not a brand new discovery for you (I really hope not) except that maybe it could be interesting to check out Watchtower project and dockerfile-maven plugin by Spotify (which we use to build our Java services' images). They are indeed extremely helpful and useful to make the whole process work here.

You can easily find the info on how to build Docker images with Dockerfile and run them on the Internet, but I am still willing to share our Maven plugin's configuration and how it works together.

FROM openjdk:8-jre-alpine
ENTRYPOINT ["/opt/workdir/entrypoint.sh"]
WORKDIR /opt/workdir/

ARG JAR_FILE

COPY entrypoint.sh /opt/workdir/entrypoint.sh
RUN chmod +x /opt/workdir/entrypoint.sh

COPY target/${JAR_FILE} /opt/workdir/


#!/usr/bin/env sh
#entrypoint.sh
# Check if DOCKER_HOST_IP is not set, needed to make JMX work
if [ -z ${DOCKER_HOST_IP+x} ]; then
    # EC2 scenario, trying to get AWS instance private IP
    DOCKER_HOST_IP="`wget -qO- -T 5 http://169.254.169.254/latest/meta-data/local-ipv4`"
    if [[ $? -ne 0 ]]; then
        DOCKER_HOST_IP="127.0.0.1"
    fi
fi

# Check if PORT is not set
if [ -z ${PORT+x} ]; then
    # Set default port 8080
    PORT="8080"
fi

set -e

JMX_OPTS="-Dcom.sun.management.jmxremote=true\
          -Dcom.sun.management.jmxremote.local.only=false\
          -Dcom.sun.management.jmxremote.authenticate=false\
          -Dcom.sun.management.jmxremote.ssl=false\
          -Djava.rmi.server.hostname=$DOCKER_HOST_IP\
          -DDOCKER_HOST_IP=$DOCKER_HOST_IP\
          -Dserver.port=$PORT\
          -Dcom.sun.management.jmxremote.port=9090 \
          -Dcom.sun.management.jmxremote.rmi.port=9090"

JAVA_OPTS="-Djava.net.preferIPv4Stack=true"

# Check if DISABLE_JMX is not set
if [ -z ${DISABLE_JMX+x} ]; then
    JAVA_OPTS="$JAVA_OPTS $JMX_OPTS"
else
    echo "JMX disabled"
fi

java ${JAVA_OPTS} -jar /opt/workdir/*.jar


<!-- Some parts from pom.xml file -->
...

<properties>
   <docker.account>your_docker_hub_account</docker.account>
   <docker.tag>${project.version}</docker.tag>
</properties>

...

<profiles>
   <profile>
      <id>prod</id>
      <properties>
         <docker.tag>latest</docker.tag>
      </properties>
   </profile>
   <profile>
      <id>dev</id>
      <properties>
         <docker.tag>latest-dev</docker.tag>
      </properties>
   </profile>
</profiles>

...

<plugin>
   <groupId>com.spotify</groupId>
   <artifactId>dockerfile-maven-plugin</artifactId>
   <version>${dockerfile-maven-plugin.version}</version>
   <executions>
      <execution>
         <id>default</id>
         <goals>
            <goal>build</goal>
            <goal>push</goal>
         </goals>
      </execution>
   </executions>
   <configuration>
      <repository>${docker.account}/${project.artifactId}</repository>
      <tag>${docker.tag}</tag>
      <buildArgs>
         <JAR_FILE>${project.build.finalName}.jar</JAR_FILE>
      </buildArgs>
   </configuration>
</plugin>

...


Speaking of CircleCI, here is a config which is pretty general and can be used widely:

#CircleCI config.yml
version: 2
jobs:
  build:
  #Setup required images to build and test the service
    docker:
      - image: circleci/openjdk:8-jdk
      - image: mongo:4.0

    working_directory: ~/build

    environment:
      # Customize the JVM maximum heap limit
      MAVEN_OPTS: -Xmx3200m

    steps:
      - checkout
      #Download and cache dependencies
      - restore_cache:
          keys:
          - v1-dependencies-{{ checksum "pom.xml" }}
          #Fallback to using the latest cache if no exact match is found
          - v1-dependencies-

      - run:
          name: Validate
          command: mvn validate

      - run:
          name: Resolve Dependencies
          command: mvn dependency:go-offline

      - save_cache:
          paths:
            - ~/.m2
          key: v1-dependencies-{{ checksum "pom.xml" }}

      - run:
          name: Build & Test
          #Perform tests with skipping of image building
          command: mvn -Ddockerfile.skip=true verify

      - setup_remote_docker
      - run:
          name: Deployment
          #Build image with tests' skipping and push it to DockerHub
          command: |
            docker login -u $DOCKER_USER -p $DOCKER_PASS

            if [[ ${CIRCLE_BRANCH} == dev* ]]; then

              mvn -Pdev -DskipTests deploy

            elif [[ "${CIRCLE_BRANCH}" == "master" ]]; then

              mvn -Pprod -DskipTests deploy

            fi

Additionally, as you see, DOCKER_USER and DOCKER_PASS environment variables should be specified in the build's configuration. As a result after each pushed commit, when you will make SSH to the service's server, you can see the following picture (or similar) with running updated container on it:

Image title

To conclude, here I basically described how your backend infrastructure can be created and configured with the main purpose: to make a daily routine of interaction with the product development more joyful and productive. If you have any ideas, comments on a variety of topics or mere questions to inquire me about just drop a comment below. Also, I have an idea to make the second part of this article where I would describe how to create the designed infrastructure with Terraform. If you think that it would be useful, just let me know. Thanks for reading and your time, talk to you later.

Topics:
architechture ,devops ,java ,aws ,startup ,maven ,infrastructure ,docker ,ci and cd ,cloud

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}