An Enterprise Application Journey: From Monolithic to a Distributed
Join the DZone community and get the full member experience.
Join For FreeIn this blog I would like to share my experience with an Enterprise application journey from a single monolithic application to a truly distributed application. I will also share the tools and techniques used during diagnostics, troubleshooting and development of application in this journey.
I was given the responsibility of production troubleshooting as we started hearing the field escalations that System (our enterprise application) is crashing and a frequent (weekly twice) restart has to be done to keep the business function. Over a period of few months (6 months) we made a series of changes, improvements and re-architecture of the system to make it a distributed, highly available, scalable and cost-effective solution for the business.
Enterprise Application Before
Environment |
Solaris Sun OS 5.9, Java 6 32-bit JVM, Weblogic 10.0 Application Server, Oracle 10g, A Hardware Load Balancer |
Application Node |
A single EAR consisting of:
|
Deployment |
Application Nodes in Weblogic Cluster Firewall => Load Balancer => 3 Application Nodes in Cluster |
Journey Report
The Application was working fine for few years when the
incoming load was not much; as the load gets increased over time the system
started showing following symptoms (in sequence):
Observation |
|
Fixes done |
Moved to 64-bit JVM and increased the heap size allocation |
Root Cause |
The memory allocated was not enough to handle the load increased from last few years |
Tools / Techniques Used |
http://www.oracle.com/technetwork/java/javase/felog-138657.html
grep -i "critical" server.out/.log files |
<br>
Observation |
Business impact due to Stuck and Deadlock threads |
Fixes done |
|
Root Cause |
Application bugs started surfacing with the increased load to application |
Tools / Techniques Used |
http://javaeesupportpatterns.blogspot.in/2011/12/prstat-solaris-pinpoint-high-cpu-java.html
http://www.oracle.com/technetwork/java/javase/tsg-vm-149989.pdf
|
<br>
Observation |
Business impact due to File Descriptor limit reach |
Fixes done |
OS File Descriptor limit increased/tuned |
Root Cause |
Increased load to application needed more threads, socket connections |
Tools / Techniques Used |
lsof -p <pid> | wc –l |
<br>
Observation |
Business impact due to DB connection limit reach |
Fixes done |
DB connection pool limit increased/tuned |
Root Cause |
Increased load to application needed more DB connections |
Tools / Techniques Used |
Weblogic logs and Console to identify DB connection limit issue |
<br>
Observation |
Business impact due to heavy unwanted incoming load |
Fixes done |
|
Root Cause |
Input validation was not enough to restrict unwanted incoming load |
Tools / Techniques Used |
Weblogic Access logs and Console to identify the incoming load |
<br>
Observation |
Business impact due to slow running SQL queries and DB CPU high usage |
Fixes done |
|
Root Cause |
Slow running SQL queries for the large data set tables |
Tools / Techniques Used |
|
<br>
Observation |
Business impact due to increased load (customer base increased) |
Fixes done |
|
Root Cause |
Customer base increased |
Tools / Techniques Used |
Weblogic Access logs and Console to identify the incoming load |
<br>
Observation |
Business impact due to slow response from Enterprise Application |
Fixes done |
SSL Offloading done between Load Balancer and Application Nodes |
Root Cause |
In case of heavy load the HTTPS communication between Load Balancer and Application Nodes was causing slow response to Clients |
Tools / Techniques Used |
|
<br>
Observation |
|
Fixes done |
|
Root Cause |
A low cost solution was asked by business |
Tools / Techniques Used |
Best practices for distributed applications adopted |
<br>
More Steps Taken |
|
Enterprise Application Now
Environment |
RHEL OS 6.x, Java 7 64-bit JVM, JBoss EAP 6.x Application Server, Oracle 11g, A Hardware Load Balancer |
Web-tier and Application Functional Nodes |
|
Deployment |
Multi-tier distributed application Firewall => Load Balancer => Web-Tier (2 Nodes) => JMS Cluster (3 Nodes) => Application Nodes (Function specific Nodes) |
Automated Monitoring |
An automated network monitoring and notification applications are in place monitoring the health of applications in each tiers |
Key Learnings
Capacity Testing & Forecasting |
A performance and capacity testing is a must with forecasted load data profiles as the system may behave differently between a normal and heavy load |
Monitoring & Diagnostics |
|
Logging |
Application logs are important to help diagnose and trace the system behavior so pay attention to what is getting logged |
Opinions expressed by DZone contributors are their own.
Comments