The Importance of Access Logs in Performance Issue Analysis
Learn how performance issue analysis using access logs of a web server and load balancer can help you find the root cause for various HTTP Error codes.
Join the DZone community and get the full member experience.Join For Free
An access log is generated by the web server to log the details about the request that it has processed. It logs status code, response time, URL, protocol, size, client IP address, etc., about the request. Load Balancers will have similar log files created to log the request details. While doing any performance analysis, these logs play an important role. It is being neglected most of the time due to lack of awareness and the usage of APM tools abstracts it from the users by allowing them to focus visually on coarse-grained data instead of fine-grained data. Most people are aware of the application server log but many of them are not aware of the web server/load balancer access log. Most of the time, this log will not be striking in the mind of a person who investigates the problem.
Access logs are available for the below servers while the format for each of them varies:
- Web servers such as Apache, Nginx, etc.,
- Load Balancers such as AWS ELB, AWS ALB, etc.,
- API Gateways such as AWS API Gateway, APIGEE, etc.,
- Gateway & Proxy servers
In this blog I am going to cover access log format of apache and AWS ELB along with a real performance issue identified using access log. Apart from that I also have listed few problems that can be investigated using access log.
Apache Access Log Format
Below is a sample entry taken from the apache access log:
220.127.116.11 - test [22/Sep/2015:08:46:01 +0400] "GET /test HTTP/1.1" 404 288 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.99 Safari/537.36"
- 18.104.22.168 - IP address from where the request was made to the web server
- test - It is the username of the user who is requesting the information from the web server
- [22/Sep/2015:08:46:01 +0400] - Timestamp when the request was received
- "GET /test HTTP/1.1" – ‘GET’ is the HTTP method used to request the object. ‘/test’ is the resource requested. ‘HTTP/1.1’ is the protocol used.
- 404 - It is the response code returned to the client
- 288 - It is the size of the object/resource returned by the server
- "-" - It is referrer URL and it is absent in this request
- "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.99 Safari/537.36" - It is the browser agent from which the request was made
ELB Log Access Format
Here is a sample taken from an ELB log:
2019-11-30T15:00:33.293541Z www-test-com 22.214.171.124:46765 126.96.36.199:80 0.000064 0.64636 0.000047 200 200 0 557178 "GET https://www.test.com:443/ HTTP/1.1" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
- 2019-11-30T15:00:33.293541Z - Timestamp when the load balancer generated the response to the client
- www-test-com - Resource ID of the ELB used
- 188.8.131.52:46765 - IP address and port of the client
- 184.108.40.206:80 - IP address and port of the target server where the request is processed
- 0.000064 – It is the elapsed time from when the load balancer received the request until it sends the request to a target server
- 0.64636 – It is the elapsed time from when the load balancer sent the request to the target server until the target server starts sending the response headers
- 0.000047 - The elapsed time from when the load balancer received the response from the target server until it started sending response to the client
- 200 - Response code sent by the load balancer
- 200 - Response code received from the target server
- 0 - Size of the bytes received from the client
- 557178 - Size of the bytes sent as response to the client
- "GET https://www.test.com:443/ HTTP/1.1" – It contains the HTTP Method, URL and protocol used in the request
- "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36” - It contains the user agent details from where the request was invoked
- ECDHE-RSA-AES128-GCM-SHA256 - It is the SSL cipher suite used in this request
- TLSv1.2 - It is the SSL protocol version used
Access Log Analysis
In a multi-tiered application, the entry point would be the web server or load balancer for most of the layers. Hence it is important to analyze these log files to check whether the request reached to the server or not. If there is no entry, then it clearly indicates the request has not reached from the previous layer to this current layer. In this case, it could be a network communication issue or a timeout issue or any other issue in the previous layer.
In one of the production incidents, it was identified that the CDN server was returning 503 error even though it is configured to server stale content if it receives such kind of error from the downstream. CDN vendor was reporting that they received the response code as 0 from the AWS ELB layer. Hence, they returned 503 error codes to the end users. On analyzing the ELB logs it was found that there was no response returned with 0 as the status code. Since the traffic was huge on the application at that point of time, the requests were getting timed out at the load balancer itself without waiting for the response from the target server. But this has nothing to do with the zero-response code as the ELB was returning 504 as the status code. Below is the line which shows the entry made during the problem period in the ELB access log:
2019-12-11T12:58:32.768181Z www-test-com 220.127.116.11:37280 - -1 -1 -1 504 0 0 0 "GET https://www.test.com:443/ HTTP/1.1" "Apache-HttpClient/2.5.5 (Java/1.7.0_122)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
In the above line,
- ELB returned status code 504
- -1 observed in the elapsed time related fields as ELB did not receive any response within the time out configured
- 0 for the response code from target as well as 0 bytes for the response as there is no response received from the target
Later it was found that there was some network issue between the CDN to the AWS ELB layer which resulted in 0 response code returned to the CDN server. This kind of analysis is possible only when we analyze the access log. It can't be done with any APM tool as it may not be able to show the data to this granular level in a straightforward manner.
Investigation of Other Problems
Problems that can be diagnosed using access log:
- An HTTP Status code 404 indicates the requested object/resource is not found in the server. It can be analyzed and fixed by having the resource in web server or by deleting the reference to the web resource from the web page. This helps in saving the unnecessary roundtrips made to the server.
- HTTP Status codes 500, 502, 503, and 504 are received from the target servers which can be further investigated for issues. 504 is caused by a longer time taken by the target server to respond and load balancer returns this error code if it exceeds the timeout configured.
- User-agent string helps in analyzing the types of devices/browsers from where the request is made. Bot attacks can be identified using user-agent and IP address fields which will have same set of values repeated. This is not possible by any real user to hit the server at that frequency.
- SSL cipher and SSL protocol versions can be used to analyze if there are any problems with SSL connectivity.
Access logs plays an important role in tracking the request passing through the layers and see where it has not reached. It should not be neglected while investigating any performance issues related to the network and HTTP status/error codes. It gives more meaningful information which will be helpful for different analysis.
Opinions expressed by DZone contributors are their own.