Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

AWS Timeouts: A Detective Story

DZone's Guide to

AWS Timeouts: A Detective Story

Follow Jack Che, troubleshooting detective, as he attempts to find out why HTTP 504 and 502 errors are cropping up in an Elastic Beanstalk environment.

· Cloud Zone
Free Resource

Site24x7 - Full stack It Infrastructure Monitoring from the cloud. Sign up for free trial.

There are a lot of similarities between detective stories (like Sherlock Holmes and James Bond) and troubleshooting production problems. Detective stories need to have a very complex/burning problem. If your application is experiencing issues in production, it automatically becomes a burning problem in the enterprise and gets attention from senior management. A detective uses very basic clues, extrapolates them, rules out the odd possibilities, puts in a lot of hard work, and identifies the villain. A detective fights against all odds, takes risks, and eradicates the evil. A lot of heroism is involved. This is no different from debugging/troubleshooting complex production problems.

Thus I am going to introduce a fictional troubleshooting character: Jack Che. Through this fictional character, I am going to narrate how complex real-world production problems faced by major enterprises are solved. Feel free to share your comments and let me know whether you like it. If not, I can always revert back to regular writing style.

While Twitter, Google, and others are talking about 10-20 millisecond response times, there are significant enterprises whose response times run for several seconds. There is one such enterprise whose response time was running for several seconds for their ‘search’ transactions. Recently, this enterprise ported their application to AWS Elastic Beanstalk environment in Java 8/Tomcat 8.

When a customer performs a ‘search’ operation on this application, a progress bar is displayed on the browser. Once search completes, the progress bar vanishes and search results are displayed. After porting to AWS Elastic Beanstalk for certain data conditions, the customer was seeing a progress bar on the screen forever. Management didn’t know what was causing this issue or how to go about solving it. Thus they engaged Tier1app LLC to solve the problem. Tier1app LLC sent out their top notch troubleshooting detective ‘Jack Che’ to solve the problem.

HTTP 504 Gateway Time-Out Error Code

Just like every time, Jack Che was excited to solve this problem. He assessed the situation quickly. He wanted to understand what interaction was going on between the server and the browser. Thus, he launched the developer console in the Chrome browser and triggered the search transaction. A few seconds later, he saw an HTTP 504 error code thrown from the server. (HTTP 504 is a time-out error thrown from the backend). Ah, his first clue.

Netx, Jack Che started to review the Ajax JavaScript, which made the backend server side call. Unfortunately, JavaScript didn’t have any error-handling code in place. Thus, when the error code was thrown, it wasn’t handled and the screen was displaying the progress bar forever. Wow, an initial breakthrough for Jack Che within few minutes of the job!

Seeing the Smoke, where is the fire?

Now Jack Che was curious to figure out from where this HTTP 504 error code was thrown. He found a second clue shortly after; at exactly the 60th second of the search transaction, this HTTP 504 error code was thrown. Jack Che believed there was some sort of timeout kicking in, but he wasn’t sure where this timeout value was configured. He searched all throughout the application source code to see whether any 60-second timeout was set up. He checked with the application development team, but there was no such timeout configured anywhere within the application source code.

Elastic Beanstalk Architecture

Now he came to the conclusion that timeout was triggered by some component that was outside of the source code. Thus, he started to examine each layer in the technology stack. Below is a very quick overview of the Elastic Beanstalk architecture.

Elastic beanstalk

Fig: High-level Elastic Beanstalk Architecture

There is an Elastic Load Balancer in the forefront. It receives requests from the customers and distributes the traffic to backend Apache servers. Each Apache server has a dedicated Tomcat server. An Apache server relays the request to the Tomcat server, then the application running on the Tomcat server processes the request and sends back the response.

Timeout in Elastic Load Balancer

As a first step, Jack Che started to look out for AWS Elastic Load Balancer’s settings. Apparently, Jack’s research revealed that AWS Elastic Load Balancer has an idle timeout value set at 60 seconds. If there is no activity for 60 seconds, then the connection is torn down and an HTTP error code 504 is thrown to the customer. Jack followed the below steps to change the timeout value in the AWS Elastic Load Balancer:

  1. Sign into AWS Console.
  2. Go to EC2 Services.
  3. On the left panel, click on the Load Balancing > Load Balancers.
  4. In the top panel, select the Load Balancer for which you want to change the idle timeout.
  5. Now in the bottom panel, under the ‘Attributes’ section, click on the ‘Edit idle timeout’ button. The default value would be 60 seconds. Change it to the value that you would like (say, 180 seconds).
  6. Click on the ‘Save’ button.

elastic.png

Fig: Editing Idle Timeout in AWS Elastic Load Balancer

After changing the timeout setting in AWS Elastic Load Balancer, Jack Che got good news and bad news.

  • Good news: The HTTP error code 504 stopped coming.

  • Bad News: A new HTTP error code 502 was thrown.

Timeout in Apache Server

The interesting part is that this new HTTP error code 502 was also exactly thrown at the 60th second. This once again confirmed that there is some other timeout value kicking in. The next layer in the technology stack is Apache web server. So, Jack Che started to tinker with the server’s settings. He figured out that in an AWS Elastic Beanstalk environment, the Apache server had a 60-second timeout value set. Now, he followed the below steps to increase this value to 180 seconds.

Note: Below are the steps to update the Apache web server settings in the Java 8/Tomcat 8 platform. If you are using a different platform, it might be different as well:

  1. In your application Web Archive WAR file, create a folder: “.ebextensions\httpd\conf
  2. Under this folder, create the file “httpd.conf” with the following:
# Managed by Elastic Beanstalk
PidFile run/httpd.pid

# Enable TCP keepclive
Timeout 180
KeepAlive On
MaxKeepAliveRequests 100
KeepAliveTimeout 180

<IfModule worker.c>
StartServers        10
MinSpareThreads     250
MaxSpareThreads     250
ServerLimit         10
MaxClients          250
MaxRequestsPerChild 1000000
</IfModule>

Listen 80

Include conf.d/*.conf
Include conf.d/elasticbeanstalk/*.conf

User apache
Group apache

CustomLog logs/access_log "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\""
TraceEnable off

LoadModule alias_module modules/mod_alias.so
LoadModule authz_host_module modules/mod_authz_host.so
LoadModule log_config_module modules/mod_log_config.so
LoadModule deflate_module modules/mod_deflate.so
LoadModule headers_module modules/mod_headers.so
LoadModule proxy_module modules/mod_proxy.so
LoadModule proxy_balancer_module modules/mod_proxy_balancer.so
LoadModule proxy_ftp_module modules/mod_proxy_ftp.so
LoadModule proxy_http_module modules/mod_proxy_http.so
LoadModule proxy_ajp_module modules/mod_proxy_ajp.so
LoadModule proxy_connect_module modules/mod_proxy_connect.so
LoadModule cache_module modules/mod_cache.so


NOTE: Here, only two changes have been made from the default:

  1. Timeout is set to 180. (Default value is 60)
  2. KeepAliveTimeout is set to 180. (Default value is 60)

After making the above change, Jack Che deployed the new WAR file to the Elastic Beanstalk environment. To everyone’s surprise, the HTTP 502 error code stopped. Search transactions completed successfully. The business was back on its wheels.

Wow! Senior management couldn’t believe that troubleshooting detective Jack Che was able to solve this problem within few hours. Excitement and celebrations continued in the happy hour party as well.

Site24x7 - Full stack It Infrastructure Monitoring from the cloud. Sign up for free trial.

Topics:
cloud ,aws elastic beanstalk ,elastic load balancer ,http errors ,tutorial

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}