Over a million developers have joined DZone.

An Enterprise Application Journey: From Monolithic to a Distributed

· Integration Zone

Build APIs from SQL and NoSQL or Salesforce data sources in seconds. Read the Creating REST APIs white paper, brought to you in partnership with CA Technologies.

In this blog I would like to share my experience with an Enterprise application journey from a single monolithic application to a truly distributed application. I will also share the tools and techniques used during diagnostics, troubleshooting and development of application in this journey.

I was given the responsibility of production troubleshooting as we started hearing the field escalations that System (our enterprise application) is crashing and a frequent (weekly twice) restart has to be done to keep the business function. Over a period of few months (6 months) we made a series of changes, improvements and re-architecture of the system to make it a distributed, highly available, scalable and cost-effective solution for the business.

Enterprise Application Before

Environment

Solaris Sun OS 5.9, Java 6 32-bit JVM, Weblogic 10.0 Application Server, Oracle 10g, A Hardware Load Balancer

Application Node

A single EAR consisting of:

  • A Servlet to accept all incoming requests
  • A bunch of JMS queues & MDBs to serve various requests and intra component communication
  • Business logic in Stateless Beans and DAO layer for DB interaction

Deployment

Application Nodes in Weblogic Cluster

Firewall => Load Balancer => 3 Application Nodes in Cluster


Journey Report

The Application was working fine for few years when the incoming load was not much; as the load gets increased over time the system started showing following symptoms (in sequence):

Observation

  • Frequent JVM Crash (weekly once) with OOM hs_err dump
  • Business impact due to downtime by server crash

Fixes done

Moved to 64-bit JVM and increased the heap size allocation

Root Cause

The memory allocated was not enough to handle the load increased from last few years

Tools / Techniques Used

  • Analysis of hs_err dump file

http://www.oracle.com/technetwork/java/javase/felog-138657.html

  • Analysis of Weblogic server logs

grep -i "critical" server.out/.log files
grep -i "deadlock" server .out/.log files

<br>

Observation

Business impact due to Stuck and Deadlock threads

Fixes done

  • Application Bug Fixes
  • Weblogic IO Performance Pack patch applied for improved socket handling

Root Cause

Application bugs started surfacing with the increased load to application

Tools / Techniques Used

  • Weblogic Console to identify stuck threads
  • Weblogic logs to identify deadlock condition and threads
  • Solaris java thread analysis: 

http://javaeesupportpatterns.blogspot.in/2011/12/prstat-solaris-pinpoint-high-cpu-java.html

  • JDK’s jmap, jstack commands for heap and thread dumps

http://www.oracle.com/technetwork/java/javase/tsg-vm-149989.pdf

  • Eclipse MAT and IBM TDA for head and thread dump analysis

http://www.eclipse.org/mat/
https://java.net/projects/tda

<br>

Observation

Business impact due to File Descriptor limit reach

Fixes done

OS File Descriptor limit increased/tuned

Root Cause

Increased load to application needed more threads, socket connections

Tools / Techniques Used

  • Weblogic logs and Console to identify FD size limit issue
  • OS commands to identify FD limit reach
  • To determine number of open connections

lsof -p <pid> | wc –l

<br>

Observation

Business impact due to DB connection limit reach

Fixes done

DB connection pool limit increased/tuned

Root Cause

Increased load to application needed more DB connections

Tools / Techniques Used

Weblogic logs and Console to identify DB connection limit issue

<br>

Observation

Business impact due to heavy unwanted incoming load

Fixes done

  • Input validation was tighten to not accept the incoming load and prevent DOS like attacks
  • Moved persistent JMS queues to non-persistent wherever applicable

Root Cause

Input validation was not enough to restrict unwanted incoming load

Tools / Techniques Used

Weblogic Access logs and Console to identify the incoming load

<br>

Observation

Business impact due to slow running SQL queries and DB CPU high usage

Fixes done

  • SQL Query tuning, Necessary DB Index created
  • DB Purge logic written to move past data from main tables to historical tables

Root Cause

Slow running SQL queries for the large data set tables

Tools / Techniques Used

  • Application logs to identify slow processing
  • Oracle DB AWR report to identify heavy and most running SQL queries

<br>

Observation

Business impact due to increased load (customer base increased)

Fixes done

  • Additional application Nodes are deployed behind Load Balancer
  • Blocked hyper active clients

Root Cause

Customer base increased

Tools / Techniques Used

Weblogic Access logs and Console to identify the incoming load

<br>

Observation

Business impact due to slow response from Enterprise Application

Fixes done

SSL Offloading done between Load Balancer and Application Nodes

Root Cause

In case of heavy load the HTTPS communication between Load Balancer and Application Nodes was causing slow response to Clients

Tools / Techniques Used

  • Weblogic and Application Logs to identify slow response
  • Software load balancer ‘pound’ used by development team for production simulation

http://www.apsis.ch/pound

<br>

Observation

  • Business Forecast to have more application Nodes as and when incoming load increases but the cost to have more Solaris+Weblogic servers are high
  • Also each new nodes addition or bug fixes required entire application downtime
  • Business requested a low cost and scalable solution with less maintenance downtime

Fixes done

  • Weblogic to JBoss EAP porting done
  • Solaris to RHEL move done
  • Single Application Node broken into distributed web-tier, jms-tier and business tier nodes where critical functional nodes can be scale without impacting other functional nodes

Root Cause

A low cost solution was asked by business

Tools / Techniques Used

Best practices for distributed applications adopted

<br>

More Steps Taken

  • Improved automated monitoring to critical systems and components health
  • Ensured Development, Testing and Production environment are in sync for quick and effective troubleshooting of production issues
  • Improved application loggings for better traceability of application behaviors


Enterprise Application Now

Environment

RHEL OS 6.x, Java 7 64-bit JVM, JBoss EAP 6.x Application Server, Oracle 11g, A Hardware Load Balancer

Web-tier and Application Functional Nodes


Distributed Application

  • A WAR for Web-tier application
    • Two nodes for load balancing
  • Separate EARs for different functional nodes
    • Each functional nodes can be separately deployable, scalable

Deployment

Multi-tier distributed application

Firewall => Load Balancer => Web-Tier (2 Nodes) => JMS Cluster (3 Nodes) => Application Nodes (Function specific Nodes)

Automated Monitoring

An automated network monitoring and notification applications are in place monitoring the health of applications in each tiers


Key Learnings

Capacity Testing & Forecasting

A performance and capacity testing is a must with forecasted load data profiles as the system may behave differently between a normal and heavy load

Monitoring & Diagnostics

  • A better monitoring and Diagnostics tools are a must for a timely and easy identification, notification and diagnostics of production issues
  • Open source software is cost effective solutions but on certain situations cannot match with the monitoring, diagnostics and support capabilities of the commercial paid software (Weblogic vs Jboss EAP).

Logging

Application logs are important to help diagnose and trace the system behavior so pay attention to what is getting logged

The Integration Zone is brought to you in partnership with CA Technologies.  Use CA Live API Creator to quickly create complete application backends, with secure APIs and robust application logic, in an easy to use interface.

Topics:

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}