These days we face a challenging task: designing a very large system of scalable instances. Each of these instances may be in a different geographic location, and many of them are on demand instances that are being started and shutdown instantly.
Another requirement in this system that a given client will be direct to a defined instance, due to system restriction (a round robin is not an option is this case).
One Step Further
Since the number of IP addresses in the internet is limited, we would like to use as few as possible Public IP addresses. This can be done using a load balancer or a proxy.
In the current state we would like to avoid using hardware load balancers in order to keep initial fixed costs minimal, but we may consider to use them in the future.
Is Amazon Cloud Load Balancer Service (AWS) is an Option?
AWS EC2 instances is a feasible option for on demand instances hosing. However, AWS charges $0.025 per a single load balancing rule per hour (+traffic). Therefore, it can be used, but for a large number of rules (>7) or high traffic, better solutions can be found in the market.
So What Can Be Done?
We left with software load balancers. The major ones Apache mod_proxy and HAProxy. Supporting large number of instances behind the load balancer can be done in one of the following two options:
- Pre register a large number of DNS addresses (sub domain) and associate them with the load balacner IP. The load balancer will simply redirect the request to the defined instance, based on the a simple rule in the load balancer. For example: http://instN.foofarm.com Pros: simple. Cons: not fully dynamic, requires additional DNS registration once in a while to keep up with the application growth.
- Performing ProxyPass in the Load balancer: Every request will include in its path an instance identification for example: http://foofarm.com/instN/. This method does not require mass DNS declarations, but it requires specific definitions in the load balancers that may be more CPU consuming. In Apache the definition is pretty trivial, however, this product is less scalable from HAProxy. In HAProxy the task can be done as well based on a two phases: switching to the server and rewrite the URI:
In order to switch to the server, you have to use ACLs to match the path,
then a use_backend directive to select a server farm ("backend"). Your
farm may very well support only one server if you want.
Then in this "backend", you can use a rewrite rule ("reqrep") to replace
the request line.
This would basically look like this :
acl path_mirror_foo path_beg /mirror/foo/
use_backend bk_66 if path_mirror_foo
reqrep ^([^: ]*\ )/instN/\(.*\) \1/\2
server srv66 220.127.116.11:80
However, Willy Tarreau, HAProxy author who kindly provided this hint for me, recommends that you avoid the second part (rewriting) because :
1) it requires good regex skills which sometimes makes the configs hard
to maintain for other people
2) rewriting URIs in applications is the worst ever thing to do, because
they never know where they are mapped, and regularly emit wrong links
and wrong Location headers during redirections.
Willy Tarreau also advices that the best thing to do clearly is to correctly configure your application to be able to respond with the real, original URI. Remapping it can be used as a transitional setup in order to ease a graceful switchover, though. Bottom line: Pros: No DNS configuration and fully scalable solution, with no dependence on DNS replication. Cons: CPU Consuming and error prone declarations.
So, What to Choose?
The answer is based on your needs, and your believe in your people regex capabilities. We made our choice.
Moshe Kaplan. Performance Expert.