Getting started with failover

This document describes how to get started as easily as possible with failover. As Solaris and Linux use entirely different packaging tools, instructions are also quite differnt:
Solaris Instructions for installing and configuring failover on a Solaris System using pkgadd
Linux Instructions for installing and configuring failover on a Linux System using rpm
FreeBSD Instructions for FreeBSD
Before you start installation, you should prepare yourself by collecting the following information:

  1. What are the names of the two hosts, which one is going to be the master, which one the slave.
  2. For what IP adress will you do failover. The two hosts have to share a common IP network.
  3. What secondary or logical interface will be configured with the failover IP adress on both hosts? Make sure that interface is no longer configured automaticall during boot. You should also know the netmask and broadcast adress for the logical interface.
After installation and configuration is complete, you will need to start the software and make sure it is also started on the next boot. Finally, you will want to monitor the failsh using the web interface.

Installing on Solaris

This section describes the installation of failover on a Solaris 2.x system with x ≥ 6. As preparation, get the package tar file AFMfail-0.5.22.tar.gz on the the system and unpack in some suitable directory, /var/spool/pkg would be a could place.

After changing your current working directory to the directory containing the extracted package, use pkgadd command to add the software to the system:

# pkgadd -d . AFMfail
You will be asked many questions you have already answered during the preparations above. After pkgadd completes, you will have the sofware installed in /opt/AFMfail, and start/stop scripts generated in /etc/init.d.

Activating the software and checking it works the same on both systems.

Installing on Linux

The installation on Linux uses the Red Hat Package Manager rpm. To install and configure the software, proceed as follows:

  1. Transfer the package file to the target system, and install using the command
    rpm -i failover-0.5.22.rpm
    
  2. Run the script setup in the directory /usr/local/share/failover
    # cd /usr/local/share/failover
    # sh setup
    
    which will ask for the parameters you prepared above. When the script completes, start/stop scripts and configuration files will have been generated.

Starting failover on Solaris or Linux

If your system has the startup scripts in /sbin/init.d, as for instance SuSE Linux does, you should replace /etc/init.d by /sbin/init.d in the discussion below.

Start/stop scripts have already been generated, so you only have to execute them. Starting the software has three phases.

  1. faild: The first phase makes the status daemon faild available:
    # /etc/init.d/faild start
    adding service ip-address
    
    Please verify that the service is running (using ps) and that it can return status information. The command /opt/AFMfail/bin/failstat -p 1291 should return a status table containing exactly one entry.
  2. IP Address: The second phase brings the service (the IP address) into a state from which it can be startet. This means configuring the logial interface, but not bringing it up yet. The script named after the ip-address will to this:
    # /etc/init.d/ip-address start
    
    The should configure the logical interface down, and record the state RECOVER in the status daemon.
  3. failsh: The third phase brings up the failsh, which will compare the local state to the state of the remote hosts, and bring up the interface if necessary.
    # /etc/init.d/failsh start
    
    Please verify using ps that failsh has been started (due to the way threads are implemented, you will see quite a few processes with this name on a Linux box). The failsh should bring up the interface if this is a master box or if no master is arround. You can use the failmon command (/opt/AFMfail/bin/failmon or /usr/local/bin/failmon) to verify the state of the local host.
This verifies the installation. If anything goes wrong, run the failsh in debug mode (option -d all) and have a look at the syslog. failsh logs to the USER facility at level DEBUG.

So far, none of the programs will be started automatically at the next boot. You should therefore add symbolic links to the rc2.d or rc3.d directories to automatically start the processes. Note that the names of these links must be such, that the programs are run in exactly the same sequence as described above. Furthermore you should make sure, alll logical interfaces are configured when some services that use them start up. If your apache process binds to a specific address, you should only start it, if the interface has already been configured. It is not necessary that the interface be up for the apache server to start, but the bind will not succeed, if the interface is not configured yet. On Solaris, I usually do the following to get this

# cd /etc/rc3.d
# ln -s ../init.d/faild S95faild
# ln -s ../init.d/193.5.25.62 S96-193.5.25.62
# ln -s ../init.d/failsh S97failsh
My apache does not do any explicit binds, so starting the failsh this late is no problem.

Failover on FreeBSD

Currently no binary releases are provided for Failover on FreeBSD, so you will have to unpack yourself. The location of Tcl on a FreeBSD system is maybe a little surprising, so you will have to find suitable options to configure. Have a look at the script configureit, which works for my installation. Don't forget to build using gmake, not make.

After Installation (gmake install), you will find configuration file templates in <prefix>/share/failover, where the prefix is /usr/local if you didn't specify anything during the configuration phase. Run the script setup in that directory. It will ask you a few questions, and build suitable configuration and startup scripts from your answers. That's it, you should now have a correct failover installation.

The process above has installed a script /etc/rc.failover, which you can use to start failover on your machine. Note that the same script can also be used to start/stop the service, by giving an argument to it as with the service start script under Linux and Solaris.

Web Interface

Using the -h option to failmon, a simple but effective web interface can be designed. You need the following for this:

  1. Link the HTML directory of the failover installation (/opt/AFMfail/html on Solaris, /usr/local/share/failover on Linux) into the document tree of the webserver.
  2. create a script /usr/local/bin/failpage with the following contents:
    #! /bin/sh
    PATH=${PATH}:/opt/AFMfail/bin
    export PATH
    cd /opt/AFMfail/html
    failmon -h >status.new
    mv status.new status.html
    exit 0
    
    and run this script every minute using a crontab entry of the form
    * * * * * [ -x /usr/local/bin/failpage ] && /usr/local/bin/failpage
    
    This will update the page every minute. Now you simply point your browser to the page status.html generated by the script, to see what's going on in failsh.

© 1999 Andreas Müller