Platinum Partner
java

Clustering Tomcat Servers with High Availability and Disaster Fallback

There has been a lot of buzz lately on high-availability and clustering. Most developers don't care and why should they? These features should be transparent to the application architecture and not something of concern to the developers of that application. But knowledge never hurts, so I emerged myself into the world of load balancing, heartbeats and virtual IP addresses. And you know what? Next time we need a infrastructure like this, I can at least sit down with the guys from the infrastructure department and at least know what the hell they are talking about.

So what exactly is a high-availability clustered infrastructure (HACI, as I'll call it from now on) ? In essence, it should be a zero-downtime infrastructure (or at least perceived as one by the end user, which means never ever returning a default browser 404 page), capable of horizontal scaling when the need for it arises and without a single point of failure. It's the SLA writer's dream. A basic HACI setup looks like this:

The users enters through a virtual IP address, assigned to one of the two load balancers. Only one of the load-balancers is active (the active master, LB1), the other one is there in the event LB1 fails ((LB2, a passive slave). The two load balancers are redundant, ie. having the exact same configuration. The load balancers redirect all traffic to the real servers. This can be done through round-robin assignment or through other means like sticky sessions, where the same user is redirected to the same server each and every time within a session. Servers can be added at any moment and configured on the load balancers. Ideally, the load balancer configuration is aware of the hardware specification and balances the load accordingly, but that's beyond the scope of this article (it involves adding weights). If all servers balanced by the load balancer fail, a backup server should be used to redirect all traffic coming from the load balancer. This can be a very lightweight server, which purpose is only to provide a sensible error page to the user (something like 'Sorry, we are performing maintenance'). Again, perception and immediate feedback to the user is key. You don't want to show the user a plain 404 page. Off course, if the backup server goes down too, you're in trouble (off course, by that time, warning bells should have gone off on every level in the hierarchy).

So how to achieve this with as little effort as possible? If you want to try this out, I suggest you start by installing a virtual machine like VirtualBox or VMWare. This way you can try out the configuration yourself. In this example, I'll be load-balancing 3 Tomcat servers using sticky sessions using 2 load balancers in active-passive mode. I'm assuming all 3 Tomcat servers share the same hardware configuration, so they are all able to handle the same amount of traffic each. I'm also throwing in a backup server, in case all 3 Tomcat servers go down (serving a custom 503 page kindly informing the user of a catastrophic failure, instead of dropping the standard 404 bomb).

You want to start off by assigning IP addresses to the servers. This will make your life a bit easier. We'll need 7 addresses: 3 for the tomcat server, 1 for the backup server, 2 for the loadbalancers and 1 virtual IP address to be shared between the load balancers (and which will be the entry point for your users). So our assignment will be:

Virtual IP   10.0.5.99     www.haci.local
LB1 10.0.5.100 lb1.haci.local #MASTER
LB2 10.0.5.101 lb2.haci.local #SLAVE
WEB1 10.0.5.102 web1.haci.local
WEB2 10.0.5.103 web2.haci.local
WEB3 10.0.5.104 web3.haci.local
BACKUP 10.0.5.105 backup.haci.local

Setting up the web servers is easy. You just install Tomcat on each server and create a simple JSP file to be served to users (make a small change, like the background color, on each server to distinguish the servers). I won't be covering session replication between the Tomcat servers, as it'll take me too far. If you want, you can configure the appropriate session replication and storage (using multicast or JDBC for example).

The backup server I'm using is a basic LAMP server that returns a simple 503 page on every request it gets. The 503 error code is important, because it reflects the current state of the system: currently unavailable.

For the loadbalancers I'll be using 2 applications: HAProxy and keepalived. HAProxy is going to handle load balancing, while keepalived will handle the failover between the two load balancers.

First, we're going to configure HAProxy for both LB1 and LB2. Installing HAProxy is quite easy on an ubuntu system. Just do a sudo apt-get install haproxy and you're off. After the install, backup the current HAProxy config and start editing away.

 

cp /etc/haproxy.cfg /etc/haproxy.cfg_orig
cat /dev/null > /etc/haproxy.cfg
vi /etc/haproxy.cfg

 

The content of the config to reflect our setup should become something like this (same config on LB1 and LB2):

 

global
log 127.0.0.1 local0
log 127.0.0.1 local1 notice
#log loghost local0 info
maxconn 4096
#debug
#quiet
user haproxy
group haproxy

defaults
log global
mode http
option httplog
option dontlognull
retries 3
redispatch
maxconn 2000
contimeout 5000
clitimeout 50000
srvtimeout 50000

frontend http-in
bind 10.0.5.99:80
default_backend servers

backend servers
mode http
stats enable
stats auth someuser:somepassword
balance roundrobin
cookie JSESSIONID prefix
option httpclose
option forwardfor
option httpchk HEAD /check.txt HTTP/1.0
server web1 10.0.5.102:80 cookie haci_web1 check
server web2 10.0.5.103:80 cookie haci_web2 check
server web3 10.0.5.104:80 cookie haci_web3 check
server webbackup 10.0.5.105:80 backup

 

After this, enable HAProxy on both LB1 and LB2 by editing /etc/defaults/haproxy

 

# Set ENABLED to 1 if you want the init script to start haproxy.
ENABLED=1
# Add extra flags here.
#EXTRAOPTS="-de -m 16"

 

So far for the HAProxy configuration. We can't start it up yet, as LB1 and LB2 aren't listening yet on the virtual IP address.

Next we'll configure the failover of the loadbalancers using keepalived. Installing it on Ubuntu is as easy as it was for HAProxy: sudo apt-get install keepalived. But its configuration is slightly different on both load balancers. First, we need to configure the both servers to be able to listen to the shared IP address. Add the following line to /etc/sysctl.conf:

 

net.ipv4.ip_nonlocal_bind=1

 

And run

 

sysctl -p

 

Now, we configure keepalived so that LB1 is configured as the main load balancer and binds to the shared IP address, while LB2 is on standby, ready to take over whenever LB1 goes down.

The configuration for LB1 looks like this (edit /etc/keepalived/keepalived.conf):

 

vrrp_script chk_haproxy {           # Requires keepalived-1.1.13
script "killall -0 haproxy" # cheaper than pidof
interval 2 # check every 2 seconds
weight 2 # add 2 points of prio if OK
}

vrrp_instance VI_1 {
interface eth0
state MASTER
virtual_router_id 51
priority 101 # 101 on master, 100 on backup
virtual_ipaddress {
10.0.5.99
}
track_script {
chk_haproxy
}
}

 

Start up keepalived and check whether it is listening to the virtual IP address.

 

/etc/init.d/keepalived start
ip addr sh eth0

 

It should return something like this, indicating it is listening to the virtual IP address

 

2: eth0:  mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 00:0c:29:a5:5b:93 brd ff:ff:ff:ff:ff:ff
inet 10.0.5.100/24 brd 10.0.5.255 scope global eth0
inet 10.0.5.99/32 scope global eth0
inet6 fe80::20c:29ff:fea5:5b93/64 scope link
valid_lft forever preferred_lft forever

 

Next, we configure LB2. The configuration is almost the same, exception for the priority.

 

vrrp_script chk_haproxy {           # Requires keepalived-1.1.13
script "killall -0 haproxy" # cheaper than pidof
interval 2 # check every 2 seconds
weight 2 # add 2 points of prio if OK
}

vrrp_instance VI_1 {
interface eth0
state MASTER
virtual_router_id 51
priority 100 # 101 on master, 100 on backup
virtual_ipaddress {
10.0.5.99
}
track_script {
chk_haproxy
}
}

 

Start up keepalived and check the network interface.

 

/etc/init.d/keepalived start
ip addr sh eth0

 

It should return something like this, indicating it is not listening to the virtual IP address

 

2: eth0:  mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 00:0c:29:a5:5b:93 brd ff:ff:ff:ff:ff:ff
inet 10.0.5.101/24 brd 10.0.5.255 scope global eth0
inet6 fe80::20c:29ff:fea5:5b93/64 scope link
valid_lft forever preferred_lft forever

 

Now, start up HAProxy on both LB1 and LB2.

 

/etc/init.d/haproxy start

 

Now you can issue requests to 10.0.5.99 (or www.haci.local), which will go to LB1, which in turn will load-balance the request to either WEB1, WEB2 and WEB3. You can test the load balancing by turning off WEB1 (or the server you're currently on). You can also the backup server by turning all main webservers (WEB1, WEB2 and WEB3). And you can test the loadbalancer failover by turning off LB1. At that point LB2 will kick in and act as the master, loadbalancing all requests. When you turn LB1 back on, it'll take over the master role once again. HAProxy allows you to add extra servers very easily, reloading the configuration without breaking existing sessions. See the HAProxy documentation for more info or on ServerFault. (http://serverfault.com/questions/165883/is-there-a-way-to-add-more-backend-server-to-haproxy-without-restarting-haproxy).

Cheap and effective. While most enterprise shops have hardware load balancers, which also have these possibilities and more, if you're on a tight budget or need to simulate a HACI environment for development purposes (a lesson here: always simulate your production environment when you're testing during development), this might be the sane option.

To finish, I'll quickly explain how to set up the backup server (a simple LAMP server).

Create a vhost configuration on the apache for www.haci.local or any other domain pointing to the virtual IP address and set up mod_rewrite for it:

 

RewriteEngine On
RewriteCond %{REQUEST_URI} !\.(css|gif|ico|jpg|js|png|swf|txt)$ [NC]
RewriteConf %{REQUEST_URI} !/503.php
RewriteRule .* /503.php [L]

 

Then create the 503.php file and add this to the top of it:

 

<?
header('HTTP/1.1 503');
header('Retry-After: 600');
?>
<html>
   <head><title>Unavailable</title></head>
   <body><p>Sorry, our servers are currently undergoing maintenance. Please check 
back with us in a while. Thank you for your patience.</p></body>
</html>

 

You can decorate the 503.php file any way you like. You can even use CSS, JavaScript and image files in the php file.

Now, back to my IDE. I'm getting withdrawal symptoms.

{{ tag }}, {{tag}},

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}
{{ parent.authors[0].realName || parent.author}}

{{ parent.authors[0].tagline || parent.tagline }}

{{ parent.views }} ViewsClicks
Tweet

{{parent.nComments}}