I was recently playing with the AWS ELB and AWS AutoScaling which I described here.
What I wanted to do was to setup some stress test and have the
Autoscaling kick in by instantiating extra instances because of the
heavy load from my test. As I said just for fun to see how it works and
what it takes to get it working. So I started with downloading and
installing JMeter as this seems the right tool to generate some load. I followed the steps from this tutorial and came up with the following test plan:
So I defined some default HTTP Request properties like the URL of the Load Balancer to which the request has to be send. I defined two requests, one for the home page and one for a blog post. Finally I added a Graph Result step so I can see the results in a graph, although I wasn’t really interested in the results at this side but more in how the servers would respond.
Before I executed the test I already increased the instances so I had
2 of them per Availability Zone with the following command:
as-update-auto-scaling-group wp-as-group --min-size 4 --max-size 6 --show-xml --region eu-west-1
Note: to downgrade the Autoscaling use the following commands:
as-update-auto-scaling-group wp-as-group --min-size 2 --max-size 4 --show-xml --region eu-west-1
as-terminate-instance-in-auto-scaling-group i-358e9e7e --decrement-desired-capacity --region eu-west-1
Then I kicked of the Test Plan in JMeter. I first tried to see the results in Amazon CloudWatch
but this didn’t give me the information I was looking for. So I decided
to open up a terminal to each of the four instances and do a ‘top’ in
each of them. This showed an interesting result: only two of the servers
were 100% working and the 2 others didn’t do anything…:
That was not what I was expecting from the load balancer! And this also explained why I didn’t understand the result of CloudWatch graph:
I only saw a result for 1 Availability Zone while I expected results for both of them but it appears my expectations were wrong, not the tool
So that raised the question: what happened? First I went through the tips and tricks part of the AWS support because I might have missed something here. That wasn’t the case but further investigation showed me that the DNS of the load balancer is resolved to an IP address in one of the Avaibility Zones. This happens randomly per ‘client’, so if ten clients send a request half of them will go to the AZ ‘eu-west-1a’ and the other part would go to ‘eu-west-1b’. What is happening in my test with JMeter is that all requests are from one client machine and that this is resolved to the same IP address for the balancer during the run of the test. See also this thread in the AWS forum. To avoid this behaviour the solution sounds easy: send requests from multiple machines. This can be done by using JMeter as described here. Or a much ‘cooler’ approach is this article about the ‘Bees with machine guns’. I haven’t tried this yet but it is on my list to try it soon!