Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

TCP: Out of Memory — Consider Tuning TCP_Mem

DZone 's Guide to

TCP: Out of Memory — Consider Tuning TCP_Mem

What happens when you're out of memory?

· Performance Zone ·
Free Resource

What happens when you're out of memory?

What happens when you're out of memory?
You may also like: Java Out of Memory Heap Analysis

Recently we experienced an interesting production problem. This application was running on multiple AWS EC2 instances behind Elastic Load Balancer. The application was running on a GNU/Linux OS, Java 8, Tomcat 8 application server. All of a sudden, one of the application instances became unresponsive. All other application instances were handling the traffic properly. Whenever the HTTP request was sent to this application instance from the browser, we were getting the following response to be printed on the browser.

Proxy Error

The proxy server received an invalid response from an upstream server.

The proxy server could not handle the request GET /.

Reason: Error reading from remote server


We used our APM (Application Performance Monitoring) tool to examine the problem. From the APM tool we could observe CPU, memory utilization to be perfect. On the other hand, from the APM tool we could observe that traffic wasn’t coming into this particular application instance. It was really puzzling. Why traffic wasn’t coming in?

We logged in to this problematic AWS EC2 instance. We executed vmstat, iostat, netstat, top, df commands to see whether we can uncover any anomaly. To our surprise, all these great tools didn’t report any issue.

As the next step, we restarted the Tomcat application server in which this application was running. It didn’t make any difference either. Still, this application instance wasn’t responding at all.

DMESG Command

Then we issued ‘dmesg’ command on this EC2 instance. This command prints the message buffer of the kernel. The output of this command typically contains the messages produced by the device drivers. In the output generated by this command, we noticed the following interesting messages to be printed repeatedly:

[4486500.513856] TCP: out of memory -- consider tuning tcp_mem
[4487211.020449] TCP: out of memory -- consider tuning tcp_mem
[4487369.441522] TCP: out of memory -- consider tuning tcp_mem
[4487535.908607] TCP: out of memory -- consider tuning tcp_mem
[4487639.802123] TCP: out of memory -- consider tuning tcp_mem
[4487717.564383] TCP: out of memory -- consider tuning tcp_mem
[4487784.382403] TCP: out of memory -- consider tuning tcp_mem
[4487816.378638] TCP: out of memory -- consider tuning tcp_mem
[4487855.352405] TCP: out of memory -- consider tuning tcp_mem
[4487862.816227] TCP: out of memory -- consider tuning tcp_mem
[4487928.859785] TCP: out of memory -- consider tuning tcp_mem
[4488215.969409] TCP: out of memory -- consider tuning tcp_mem
[4488642.426484] TCP: out of memory -- consider tuning tcp_mem
[4489347.800558] TCP: out of memory -- consider tuning tcp_mem
[4490054.414047] TCP: out of memory -- consider tuning tcp_mem
[4490763.997344] TCP: out of memory -- consider tuning tcp_mem
[4491474.743039] TCP: out of memory -- consider tuning tcp_mem
[4491859.749745] TCP: out of memory -- consider tuning tcp_mem
[4492182.082423] TCP: out of memory -- consider tuning tcp_mem
[4496318.377316] TCP: out of memory -- consider tuning tcp_mem
[4505666.858267] TCP: out of memory -- consider tuning tcp_mem
[4521592.915616] TCP: out of memory -- consider tuning tcp_mem


We were intrigued to see this error message: “TCP: out of memory — consider tuning tcp_mem”. It means out of memory error is happening at TCP level. We had always taught out of memory error happens only at the application level and never at the TCP level.

Problem was intriguing because we breathe this OutOfMemoryError problem day in and out. We have built troubleshooting tools like GCeasy, HeapHero to facilitate engineers to debug OutOfMemoryError that happens at the application level (Java, Android, Scala, Jython applications). We have written several blogs on this OutOfMemoryError topic. But we were stumped to see OutOfMemory happening at the device driver level. We never thought there would be a problem at the device driver level, that too in, stable Linux operating system. Being stumped by this problem, we weren’t sure how to proceed further.

Thus, we resorted to Google god’s help . Googling for the search term: “TCP: out of memory — consider tuning tcp_mem”, showed only 12 search results. For one article, none of them had much content. Even that one article was written in a foreign language that we couldn’t understand. So, we aren’t sure how to troubleshoot this problem.

Now left with no other solutions, we went ahead and implemented a universal solution i.e. “restart”. We restarted the EC2 instance to a put-off immediate burning fire. Hurray! Restarting the server cleared the problem immediately. Apparently, this server wasn’t restarted for several days (like more than 70+ days), may be due to that application might have saturated TCP memory limits.

We reached out to one of our intelligent friends who works for a world-class technology company for help. This friend asked us the values that we are setting for the below kernel properties:

  • core.netdev_max_backlog.
  • core.rmem_max.
  • core.wmem_max.
  • ipv4.tcp_max_syn_backlog.
  • ipv4.tcp_rmem.
  • ipv4.tcp_wmem.

Honestly, this is the first time, we are hearing about these properties. We found that below are the values set for these properties in the server:

net.core.netdev_max_backlog = 1000
net.core.rmem_max = 212992
net.core.wmem_max = 212992
net.ipv4.tcp_max_syn_backlog = 256
net.ipv4.tcp_rmem = 4096        87380   6291456
net.ipv4.tcp_wmem = 4096        20480   4194304


Our friend suggested to change values as given below:

net.core.netdev_max_backlog=30000
net.core.rmem_max=134217728
net.core.wmem_max=134217728
net.ipv4.tcp_max_syn_backlog=8192
net.ipv4.tcp_rmem=4096 87380 67108864
net.ipv4.tcp_wmem=4096 87380 67108864


He mentioned setting these values will eliminate the problem we had faced. Sharing the values with you (as it may be of help to you). Apparently, our values have been very low when compared to the values he has provided.

Conclusion

Here are a few conclusions that we would like to draw:

  • Even the modern industry-standard APM (Application Performance Monitoring) tools aren’t completely answering the application performance problems that we are facing today.
  • Dmesg’ command is your friend. You might want to execute this command when your application becomes unresponsive, it may point you out valuable information.
  • Memory problems don’t have to happen in the code that we write, it can happen even at the TCP/Kernel level.


Further Reading

What Causes OutOfMemoryError?

14 Best Performance Testing Tools and APM Solutions

Topics:
java (programming lang... ,java (software) ,cpu ,api ,performance ,memory

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}