Apple OS X IP Failover Daemon

IP Failover is still not present in OS X 10.8 Server. Apple also changed it's strategy from Managed Preferences (MCX) to Mobile Device Management (MDM). This is a significant change for many enterprise organizations with large Mac deployments. With the change to the MDM-strategy the ability to use Active Director or other third party applications is also impossible. The main fact is there is no need for a Mac Server and Failover solution anymore. To create MDM-Profiles for IOS and OS X devices you don't need a high availability solution.

I'll write down my last approaches of the os x pacemaker port in a new post. Maybe someone will need it to create an over dozed failover solution for a Render/Video Server.

http://www.cultofmac.com/182143/apples-profile-manager-and-the-future-of-mac-management-feature/ http://www.cultofmac.com/170752/apple-serves-up-mac-businessenterprise-resources-ahead-of-mountain-lion/

Apple has removed IP Failover from OS X Lion Server (10.7.2).

Doesn't work as expected. The heartbeatd works fine, but failoverd crashs the kernel after a take over process is initiated. The system can't boot after that happened, just an reinstall will fix the problem. Weird thing, maybe someone know how to fix this issue? If you need the Failover Daemon i.e. for AFP, you can port it from an OS X 10.6.x Server Installation (hereinafter a short HowTo). Copy the following files from an OS X Snow Leopard Server (10.6) to a Lion (10.7) Installation. A Lion Server isn't necessary.

scp /usr/sbin/heartbeatd root@lion.server:/usr/sbin/.
scp /usr/sbin/failoverd root@lion.server:/usr/sbin/.
scp -r /System/Library/PrivateFrameworks/CoreServer.framework root@lion.server:/System/Library/.
scp /usr/libexec/NotifyFailover root@lion.server:/usr/libexec/.
scp /usr/libexec/ProcessFailover root@lion.server:/usr/libexec/.
scp -r /System/Library/StartupItems/IPFailover root@lion.server:System/Library/StartupItems/.

Hopefully Apple will come up with a new Failover solution some day. OD just have one built in, and how the new AFP daemon works I haven't found out yet (the AppleFileServer manpage is the worst I've ever seen!). The SAN (integrated in Lion) features would be an other way for an AFP high availability solution.

Maybe this should work on OS X Mountain Lion (10.8).

libnet

	curl -L http://sourceforge.net/projects/libnet-dev/files/libnet-1.1.5.tar.gz/download -o libnet.tar.gz
	tar xzvf libnet.tar.gz
	cd libnet-1.1.5
	./configure --prefix=/opt/local
	make
	sudo make install

Create a local user and group. You can use the directory editor gui (directory utility) or the command line:

dseditgroup -o create haclient dscl . -create /Users/hacluster dscl . -create /Users/hacluster UserShell /usr/bin/false dscl . -create /Users/hacluster UniqueID 502 dscl . -create /Users/hacluster PrimaryGroupID 502 dscl . -create /Users/hacluster NFSHomeDirectory /var/lib/heartbeat/cores/hacluster dscl . -create /Users/hacluster RealName “Cluster User” export CLUSTER_USER=hacluster export CLUSTER_GROUP=haclient

	export CLUSTER_USER=admin
	export CLUSTER_GROUP=admin
	export PREFIX=/opt/local

Cluster Glue

	hg clone http://hg.linux-ha.org/glue
	cd glue
	./autogen.sh
	./configure --prefix=$PREFIX --with-initdir=/private/etc/mach_init.d --with-daemon-user=${CLUSTER_USER} --with-daemon-group=${CLUSTER_GROUP}
	make
	sudo make install

Resource Agents

	git clone git://github.com/ClusterLabs/resource-agents.git
	cd resource-agents/
	./autogen.sh
	/opt/local/bin/autoreconf -i
	./configure --prefix=$PREFIX --with-initdir=/private/etc/mach_init.d
	make
	sudo make install

Heartbeat

	hg clone http://hg.linux-ha.org/dev heartbeat-dev
	cd heartbeat-dev/
	./bootstrap
	./configure --prefix=$PREFIX --with-initdir=/private/etc/mach_init.d
	make
	sudo make install

Pacemaker

	hg clone http://hg.clusterlabs.org/pacemaker/1.1/
	cd 1.1
	./autogen.sh
	/usr/local/bin/autoreconf -i
	./configure --prefix=$PREFIX --with-initdir=/private/etc/mach_init.d --with-heartbeat --disable-fatal-warnings
	make
	sudo make install

The IP Failover feature is a daemon process integrated in OS X Server. It's mostly used for high availability server setups, i.e. if you use OS X Server as a OpenDirectory (OD) Server for client home folders stored on the server or for Netboot clients.

OD it self doesn't need the IP Failover daemon features. If you have at least 2 ODs running and one server goes off line, the clients automatically find the other OD servers, it's a built in OD feature. Unlike to AFP or other applications, i.e. a type server or the SWUPD (software update daemon), the clients won't find the new server if the IP or URL of the master is gone.

Over the years I did a lot of IP Failover setups for large companies. The main setups and how they work are well documented by Apple and some other resources on the web.
Here you can find a nearly perfect “IP Failover Test Script”, that checks a little more before the take over starts. The main problems are to restart the master server (without shutting down the replic) or release him after an acquire without getting inconsistence of the home directories data. The so called STOMITH (Shoot The Other Machine In The Head) procedure is not necessary any more (who did invent this rough solution?).

If you like to use them commercially, please contact me.

Test

This IP Failover use additionally a rSync Script to update the home directory files. You can find this script in the Linux section.

#!/bin/bash
# (c) 2011 www.schellworth.de v.0.2.1
#
# This Script checks if the FileSync is clean and additionally if the IP is really down after <WAIT> seconds.
# 
# WARNING: If the FileSync is not clean (maybe after a past take over and the files hasn't been updated) a TakeOver won't proceed!
# 
 
 
STATE=$1					#acquire or release mode
IP=$2						#on hook IP
WAIT=240					#sleep X seconds (240 = 4 minutes), to check if the Server is really down (or just reboots)
HOST_IP=192.168.0.1 				#2nd interface to check system health
SYNC_STATE="/Library/Scripts/rsync/rsyncbackup.com"
 
 
 
 
logger "IPFailover (Test): Testscript starts to $1 IP: $2"
 
check_link=$( ls -l $SYNC_STATE )               #Get the linked target
target=${check_link#* -> }
target_file=${target##*/}
 
if [ $target_file == "activesync.sh" ]       	#Check if the FileSync is active
then
	/sbin/ping -q -c1 $HOST_IP &> /dev/null         #Check the 2nd IP if the Master is really down
	if ([ "$?" -gt 0 ] && [ $STATE == "acquire" ])
	then						#ACQIURE
		logger "IPFailover (Test): $HOST_IP isn't reachable! Test again in $WAIT seconds. ..."
		sleep $WAIT				#wait a few seconds
		logger: "IPFailover (Test): ... ping $HOST_IP"
		/sbin/ping -q -c1 $HOST_IP &> /dev/null	#Check the 2nd IP again
		if [ "$?" -gt 0 ]
		then
			logger "IPFailover (Test): Master is DOWN! Acquiring $IP will proceed. (0)"
			exit 0
		else
			logger "IPFailover (Test): $HOST_IP is UP. Canceling to acquire $IP (5)."
			exit 5
		fi
	else
		if [ $1 == "release" ]		#RELEASE
     		then
			logger "IPFailover (Test): Master is UP again. Releasing $IP will proceed"	
			exit 0	
		else
			logger "IPFailover (Test): $HOST_IP is UP! Canceling to acquire $IP (100)."
			echo "started from command line?"
			echo "usage: Test ['acquire' | 'release'] ['IP']"
			exit 10	
		fi
	fi
else
	logger "IPFailover (Test): FileSync is disabled. Please reverse the FileFsync process!"
	logger "IPFailover (Test): $STATE $IP canceled! (50)"
 
	SUBJECT="WARNING!!! The IP Failover $IP failed!" 
	TO="support@it.org" 
	BODY="Please check the server logs."
	echo "$BODY" | mail -s "$SUBJECT" "$TO"
 
	exit 50
fi

PreAcq

#! /bin/bash
# (c) www.schellworth.de
 
logger "IPFailover: Starting PreAcq script"
 
# Disable rsync Process
ln -sf /Library/Scripts/rsyncbackup/inactivsync.sh /Library/Scripts/rsyncbackup/rsyncbackup.com
logger "IPFailover: RSYNC disabled"

PostAcq

#! /bin/bash
# (c) www.schellworth.de
 
logger "IPFailover: Starting PostAcq script"
 
# Starting AFP-Daemon
serveradmin start afp
logger "IPFailover: AFP started"

PreRel

#! /bin/bash
# (c) www.schellworth.de
 
logger "IPFailover: Starting PreRel script"
 
# Stop afp-Daemon 
serveradmin stop afp
logger "IPFailover: AFP stopped"

PostRel

#! /bin/bash
# (c) www.schellworth.de
 
logger "IPFailover: Starting PostRel script"

Resources

JT, 2011/12/23 08:34

I wish they would do something. Know of any third party alternatives? I much like you have created my only scripts but using PINGs instead of the hearbeatd protocol on both sides. I then wrote a custom fail over script.

Philipp Schellworth, 2011/12/23 10:53

There is a Linux alternative, but it's very kernel based and wouldn't work on Darwin - http://www.keepalived.org/. I'm thinking of to code an alternative in bash. One important feature is to take over a virtual IP on hook of one interface (that won't be a big issue). The other issue is to start this script in launchd and restart it, when it crashed. The best way is to code it in Objective C or in any other language, but this will be a bigger issue (for me).

Philipp Schellworth, 2011/12/23 11:17, 2012/01/19 10:33

Here are some more Linux alternatives: http://www.toniwestbrook.com/archives/184 http://www.linux-ha.org/wiki/Main_Page http://ostatic.com/mpathd/home/1# http://www.ultramonkey.org/

I'll have a deeper look into this the next days.

Philipp Schellworth, 2012/01/19 10:33

MPATHD is very outdated. Ultramonkeys is also outdated. I'm currently looking a little bit deeper into the linux-ha heartbeat solution.

jt, 2011/12/23 08:47

The conclusion that I have come to is that it has been completely taken out. The general consensus is that apple is moving away from the enterprise market (isn't it obvious?)

My best guess is that heartheatd and failover didn't simply port over from their previous source and maybe had issues with compatibility with the removal of classic support in Lion. Similar to the Final cut fiasco. I also think that is why they re engineered the presentation and GUI interface so radically, and also choose to leave other services out completely.

The only other reason would be that they know of another product that does it differently and better, similar to their stance on xserve raids and having them discontinued and replaced by Promise.

The problem is I do not know of any alternative 3rd party solution that does what it did. I ended up writing my own scripts that monitor my two servers and then fails over in the case of failure.

If anyone with applescript and servers wants to collaborate on what I have I would be totally open. I have been running it for about 3 months in production without issues, and prior tested all functionality which worked great.

Philipp Schellworth, 2011/12/23 11:32

That should be the sadly truth. Hence I think there is a way for an 3rd party alternative. Maybe I could port one of the existing Linux projects to Darwin, would be worth a try.

jt, 2012/04/12 21:35

Hello any luck with a 3rd party alternative, I ended up writing a new script in applescript for a client, I am going to begin to port it to Xcode applescript cocoa with GUI and build it out a little bit. I would love to get your opinion of some of the procedures.

Philipp Schellworth, 2012/04/17 08:10

After some struggle I've successfully compiled heartbeat/pacemaker on OS X. That should be a good base to go a head to configure it. I'll post my results soon.

simon phai, 2012/06/07 08:18

hi guys any solid evidence that the ip failover works in Lion server? because i did some work as in the rsync is patch to 3.09 and in theory if there is a patch on the rsync i believe that by do int that the fail over would work in lion server.

Philipp Schellworth, 2012/06/10 09:57

sorry I'm very busy at the moment and can't go on with the pacemaker port at the moment. if any one is interested to help me, PM me. rsync is used for time-machine, so it's patched well. I don't think apple will come up with a new ip failover solution.

jt, 2012/07/27 23:27

There is no solid evidence it works in lion, in fact there is overwhelming evidence it doesn't.

best thing to do is create a couple applescripts/bash that will ping your servers. You could set up a monitoring server to do this, or have each server (primary and back up) ping eachother. If ping fails, begin to do scripted actions for fail over. Philipp's algorithm is solid and I came up with my own that is very similar.

You can: -switch IP's with simple commands that change your Network location -enable or disable protocols -send out warning emails.

I was orignally going to release a GUI replacement for fail over but lost my motivation after getting a new job.

Dolly, 2012/09/14 16:21

I'm quite pleased with the infomrtiaon in this one. TY!

Enter your comment. Wiki syntax is allowed:
  _   __   ____ __  __   ____   __ __
 | | / /  / __/ \ \/ /  /  _/  / // /
 | |/ /  / _/    \  /  _/ /   / _  / 
 |___/  /_/      /_/  /___/  /_//_/