A proxy server should be backed up using the failover utilities, so that the second proxy takes over in case the main proxy fails. The user should hardly notice that she is now working with a different proxy, it should not be necessary to restart neither the proxy server, nor the browser.
Note that DNS cannot provide a solution for this problem, since the browser will resolve the name of the proxy only once. However, DNS round robin together with the failover utilities can give a natural load balancing solution with high availability.
The proposed solution uses an additional IP address different from the main addresses of both machines, which will be active on a virtual interface of the active machine. Logically, This IP address is the service that is either active on the main or the backup machine.
To fix ideas let us consider two machines:
name | ip address | interface | function |
---|---|---|---|
lukretia | 193.5.25.27 | hme0 | master |
macbeth | 193.5.25.20 | le0 | slave |
Both systems happen to be Sun Systems running supported versions of Solaris 2.x.
Failover should happen for the address 193.5.25.62. To use the failover utilities we have to make sure that the following process are implementated on the two machines:
The configuration consists in the following parts:
sample
subdirectory of the
Solaris package distribution or in the samples/ipfailover-solaris
of the source distribution.
#!/bin/sh # # start / stop the faild daemon # servicenames="193.5.25.62" port=1291 case `hostname` in macbeth) other=lukretia;; lukretia) other=macbeth;; esac case .$1 in .start) # if an old state file is around, we use this, this makes # it easier to start/stop a faild without loosing all # the state. But note that a slave will take over a service, # if it can no longer talk to the faild. You have to stop # all services that are UP /opt/AFMfail/sbin/faild -c communityfile -U -p ${port} \ -r /opt/AFMfail/var/faild.pid for n in ${servicenames} do echo adding service ${n} /opt/AFMfail/bin/failc -c public -T -p ${port} \ -s FAIL -b ${other} ${n} done ;; .stop) kill `cat /opt/AFMfail/var/faild.pid` ;; esac |
failc
call in the script makes sure all the parameters
are set, including the purely informational name of tha backup system for
a service.
Previous versions of this file started failsh
in this script. This
becomes inconvenient if there are several services, so failsh
is now started from a separate script.
The service start script is usually named after the service, in this
case just the IP address to be failed over.
It must be installed in as /etc/init.d/193.5.25.62
and linked to the appropriate directories in /etc/rc?.d. Make sure
it is started after faild.
In addition to the standard arguments start
und stop
it supports the following additional arguments:
start
stop
fail
recover
reclaim
up
down
#!/bin/sh # # IP address failover using faild/failsh # servicename=193.5.25.62 netmask=255.255.255.0 broadcast=193.5.25.255 interface=hme0 subif=:1 port=1291 monitorport=1848 case .$1 in .start) # The start argument brings the service into a well defined # recoverable state. If this is a master, it will later be # started by the up method /sbin/ifconfig ${interface}${subif} plumb /sbin/ifconfig ${interface}${subif} inet ${servicename} \ netmask ${netmask} broadcast ${broadcast} down # for the next step to work, faild must be fully up, it # may be necessary to include a small pause to give faild # another chance /opt/AFMfail/bin/failc -c public -T -p ${port} -sRECOVER \ ${servicename} ;; .recover) /opt/AFMfail/bin/failc -c public -T -p ${port} -sRECOVER \ ${servicename} ;; .reclaim) # reclaing an IP address: send gratuitous ARP only /opt/AFMfail/sbin/grarp ${interface} ${servicename} ;; .fail) # The fail argument brings the service down, and marks it # as failed so that it will not be started again /sbin/ifconfig ${interface}${subif} down /opt/AFMfail/bin/failc -c public -T -p ${port} -sFAIL \ ${servicename} ;; .stop) # The stop argument kills all processes associate with # the failover kill `cat /opt/AFMfail/var/failsh.pid` ;; .up) # starting the service without putting any information in # the faild /sbin/ifconfig ${interface}${subif} up # send gratuitous ARP /opt/AFMfail/sbin/grarp ${interface} ${servicename} ;; .down) # Take the service down without putting any information in # the faild /sbin/ifconfig ${interface}${subif} down ;; esac exit 0 |
More complex applications, e.g. firewalls with several interfaces, or
applications servers with a large number of server processes, will
have more complex service start/stop scripts. However, they will
want to use the same targets, so that the same relatively simple
failsh
configuration files can be used.
The failsh start script starts failsh with the appropriate arguments.
In particular, it should tell the shell which tcl script to use. It
is also recommended that the monitor mode be activated (using the
-m
option), as in the example below.
#! /bin/sh # # IP address failover using faild/failsh # # This script only starts the failsh, any initializations of services # are left to individual scripts for each service # # $Id: example.html,v 2.3 2006/03/20 22:44:05 afm Exp $ # port=1291 monitorport=1848 debugport=1918 case .$1 in .start) # start the failsh /opt/AFMfail/sbin/failsh \ -f /opt/AFMfail/etc/`uname -n`.tcl \ -i /opt/AFMfail/var/failsh.pid \ -m ${monitorport} -D ${debugport} ;; .stop) # The stop argument kills all processes associate with # the failover kill `cat /opt/AFMfail/var/failsh.pid` ;; esac exit 0 |
The standard installation uses the names master.tcl and slave.tcl. Using some other names for the configuration names is recommended, as package installation or deinstallation may remove or overwrite your carefully crafted configuration files.
The configuration file is a Tcl script that must specify who is the master
and slave, what parameters should be used when talking to the remote
failds and how to start and stop the service. In this example,
the script /etc/init.d/193.5.25.62
with the arguments
up
and down
is called from the functions
startservice
and stopservice
to bring to
service up or down.
# # IP address failover on Host lukretia (the master) # # # that startservice procedure starts the IP address on the virtual # interface # proc startservice {} { exec /etc/init.d/193.5.25.62 up return "UP" } # # the stopservice procedure stops the IP service on the virtual interface # proc stopservice {} { exec /etc/init.d/193.5.25.62 down return "FAIL" } # # the recoverservice procedure recovers from a spurious slave # proc reclaimservice {} { exec /etc/init.d/193.5.25.62 reclaim return "UP" } failsvc 193.5.25.62 create failsvc 193.5.25.62 interval 5 failsvc 193.5.25.62 start startservice failsvc 193.5.25.62 stop stopservice failsvc 193.5.25.62 reclaim reclaimservice failsvc 193.5.25.62 slaves macbeth failhost macbeth port 1291 failhost macbeth protocol tcp failhost macbeth community public failhost localhost port 1291 failhost localhost protocol tcp failhost localhost community genesis |
# # IP Address failover on host macbeth (the slave) # # # that startservice procedure starts the IP address on the virtual # interface # proc startservice {} { exec /etc/init.d/193.5.25.62 up return "UP" } # # the giveip procedure stops the IP service on the interface eth0:1 # proc stopservice {} { exec /etc/init.d/193.5.25.62 down return "RECOVER" } # # the recoverservice procedure recovers from a spurious slave # proc reclaimservice {} { exec /etc/init.d/193.5.25.62 reclaim return "UP" } failsvc 193.5.25.62 create failsvc 193.5.25.62 interval 5 failsvc 193.5.25.62 start startservice failsvc 193.5.25.62 stop stopservice failsvc 193.5.25.62 reclaim reclaimservice failsvc 193.5.25.62 masters lukretia failhost lukretia port 1291 failhost lukretia protocol tcp failhost lukretia community public failhost localhost port 1291 failhost localhost protocol tcp failhost localhost community genesis |