2008-06-27

snaps and rc.server

Snaps is a Korn Shell backup script for our servers. It is intended to do one rsync (1)snapshot per day of the entire boot drive onto a local fireway drive. In addition, it does a periodic clone of the boot drive using ditto(1). There are a couple of unusual aspects to this script.

First, it parameterizes the archiving of snapshots in an unusual way. It keeps a week's worth of daily snapshots (all of the numbers here are parameters). Then, it uses an exponential function to decide which older snapshots to delete. It always will keep one snapshot in each integral range, so one for 2^0 days, one for 2*1 days, one for 2^2 days, one for 2^3 days, and so on up to one for 2^8 days. Then, it always keeps backups that are older than 365 days. (Remember, all of these numbers can be tweeked.) This results in a kind of S-shaped function of the frequency of preserved snapshots per unit of time. Most systems like this do something similar, but use standard calendar periods, like so many daily, so many weekly, so many monthly snapshots. I thought that the exponential function would be more general than this, so that's what "snaps" uses.

Second, and this is what I'm wrestling with at this stage of development, I want to automate this script. However, there are several server databases that cannot be "live" when a snapshot is taken, and in general, a backup is much more valuable if the system is quiescent when it is done. My idea is to set up a periodic process that will reboot the system in the "wee hours" of the morning. The snapshot script will be run early in the boot process, before things really get started in the system.

It took me quite a while to figure out how to do this because of how launchd and launchctl work. There doesn't seem to be any way to get things to happen early enough. However, a perusal of the launchctl/launchd source revealed that there is a section at just the appropriate moment when a script called "/etc/rc.server" is executed if present. This is done right after single-user mode and has much the same context as single-user mode.

I just added some lines to the end of rc.server to see what the environment is. (Rc.server's standard output is placed into /var/log/system.log.) The commands I added were /sbin/mount and /bin/ps. Here is what was reported:


Note that almost nothing is running, just launchd, launchctl, and the shell, which is running /etc/rc.server. This is a quiescent system. Also note that the boot drive is still mounted read-only, which is ideal for the purposes of making a backup.

Here is the current version of snaps:


#!/bin/ksh
# ----------------------------------------------------------------------
# snaps -- maintain a set of filesystem snapshots
# the basic idea is to make rotating backup-snapshots of sourcedir
# onto a local volume whenever called. The philosophy is to put all of
# the configuration and logging information into the backup directory,
# so that snaps requires only that path to get going. The scurve filter
# causes an s-shaped frequency of preserved snapshots, with more recent
# and fewer old snapshots.
#
# Important note: HFS+ filesystems are apparently set to ignore ownership
# for all but the boot drive. This must be disabled using the Finder's
# Get Info panel. (Is there a way to check for this programatically?)
#
# NOTE: rsync must be version 3 or better
# ----------------------------------------------------------------------
# Usage: snaps [-n] SNAPS_DIR [ROOT]

# -------shell function defs------------------------------------------

# compare the current time in secs to a list of dates
# if on return, ${snap[0]} = secs, then we need to do a backup, otherwise do nothing
# also, the old backups in rmrf need to be expunged. ante is the most recent previous
# backup, if any.
function scurve {
typeset secs age tmp x i

secs=$1 ; shift
tmp=$(perl -e "@x=sort { \$b <=> \$a } qw($*);print \"@x\",\"\\n\"")

if [[ "$tmp" == "" ]] ; then
unset snap
snap[0]=$secs
return
fi
for ante in $tmp ; do
break
done

((age=secs-ante)) # age in secs of most recent snap
if [[ age -le JOUR ]] ; then # too soon
return
fi

unset snap
unset arch
unset curr
unset rmrf
for x in $tmp ; do
((age=(secs-x)/JOUR)) # age in ticks
if [[ age -le 0 ]] ; then # too soon
print age $age secs $secs x $x
continue
fi
# take care of the current backups in "real time"
if [[ age -le CURR ]] ; then
curr="$curr${curr:+ }$x"
continue
fi
# also take care of the archival backups in "real time"
if [[ age -ge ARCH ]] ; then
arch="$arch${arch:+ }$x"
continue
fi
# now set the base of the exponential portion
((age-=CURR))
((i=1+floor(log(age)/log(BASE))))
if [[ "${snap[i]}" == "" ]] ; then # nothing in this slot yet
snap[i]=$x
elif [[ ${snap[i]} -gt $x ]] ; then # always keep the older one
rmrf="$rmrf${rmrf:+ }${snap[i]}"
snap[i]=$x
else # keep unless current
rmrf="$rmrf${rmrf:+ }$x"
fi
done
if [[ "${snap[0]}" == "" ]] ; then
snap[0]=$secs
fi
}

# errs and other log stuff all go to stderr
log(){
print -u2 -- "$where:$TO@$(date +%Y%m%d.%H%M%S) $(basename $ME .ksh): $*"
}
finish(){
if [[ -e snaps.log ]] ; then
mail -s"Snaps Status for $where:$TO" root < snaps.log
rm snaps.log
fi
exit $1
}
err(){
log "$*"
finish 1
}

nopt=0
rsyncopt(){
RSYNC_OPTS[nopt++]="$RSYNC_OPTS${RSYNC_OPTS:+ }$*"
}

# ---------------------- basic parameters --------------

# NOTE: define RSYNC to a version that is 3.0.0 or newer
RSYNC=/opt/local/bin/rsync

# these are for error message purposes (see functions log & err)
ME=$0
where=$(hostname)

# limit path to /bin and /usr/bin except we need
PATH=/bin:/usr/bin

# ------------- args, file locations ----------------------------

case "$1" in
"-n" ) now=print ; dry="-n" ; shift ;;
* ) now= ; dry= ;;
esac

TO=$1

if [[ "$TO" == "" ]] ; then
err "Usage: snaps [-n] SNAPS_DIR]"
fi

# make sure we're running as root so we can start logging
if [[ `id -u` != 0 ]] ; then err "Not root" ; fi

if [[ ! -d $TO ]] ; then
err "No such directory $TO"
fi
eval `stat -s $TO`
if [[ $st_uid -ne 0 || $(($st_mode&0777)) -ne $((0755)) ]] ; then
err "$TO not mode 755 directory owned by root $st_uid $st_mode $(($st_mode&0777)) 0755"
fi

cd $TO

# set up errors from this point to be redirected to the log except for dry runs
# we do one log per backup and we store it in the snapshot folder as a record
# of that snapshot

if [[ "$now" == "" ]] ; then
if ! exec 2> snaps.log ; then
err "failed to write in $TO -- read only volume?"
fi
fi

log "Begin $dry"

# -------------- rsync parameters -------------
rsyncopt -vq # verbose error messages
rsyncopt -a # archive mode: -rlptgoD
rsyncopt -x # do not cross filesystem boundaries
rsyncopt --protect-args
rsyncopt --delete-excluded # implies --delete
rsyncopt -A # --acls
rsyncopt -X # --xattrs

# the makers of carbon copy cloner also recommend these options which are
# not available in the macports version of the program:
# rsyncopt --fileflags
# rsyncopt --force-change

# ------------ do some more checking -----------------

# NOTE: this needs to check for "Capabilities" <<<<<<<<<<<<<<<<<<<<<<
# insist on v. 3.X for working link-dest and xattrs
# if and when v. 4.X comes out, fix the script
case "$($RSYNC --version)" in
*'version '[012]'.'* ) err "$RSYNC is older than version 3.X" ;;
*'version '[456789]'.'* ) err "$RSYNC is newer than version 3.X" ;;
esac

# --------- the snapshots subdirectory ---------------

DD=$TO/snapshots
if [[ ! -d $DD ]] ; then
err "No such directory: $DD"
fi
eval `stat -s $TO`
if [[ $st_uid -ne 0 || $(($st_mode&0777)) -ne $((0755)) ]] ; then
err "$DD must be an rwx directory owned by root"
fi

# --------- configuration files -----------------
# they can be empty, but they must be uid0 and mode 0644

for x in config filter ; do
if [[ ! -f $TO/snaps.$x ]] ; then
err "No such file: $TO/snaps.$x"
fi
eval `stat -s $TO/snaps.$x`
if [[ $st_uid -ne 0 || $(($st_mode&0777)) -ne $((0644)) ]] ; then
err "$TO/snaps.$x not mode 0644 and owned by root"
fi
done

# ---------- use filter file if there is one -------
if [[ ! -s $TO/snaps.filter ]] ; then
rsyncopt "--cvs-exclude"
else
rsyncopt "--filter=. $TO/snaps.filter"
fi

# -----------------everything looks ok, let's get started--------------

# set defaults
ROOT="/"
VERSION=1
CURR=7
ARCH=731
JOUR=86400
BASE=2

# get overrides and other config info
# the only thing legal in this file is variable definitions
# of a few numeric or filepath parameters. to do comments, simply start
# the line with "#" or the word "rem" or "comment".
exec < snaps.config
while read x path ; do
for y in $path ; do
break
done
case $x in
"" ) continue ;;
ROOT )
ROOT="$path"
continue
;;
VERSION|CURR|ARCH|JOUR|BASE )
if [[ "$y" == "" || "$y" == *[^0-9.]* || "$x" != "$path" ]] ; then
err "Bad assignment in snaps.config line: \"$x\" \"$path\""
fi
eval "$x=$y"
continue
;;
comment|COMMENT|rem|REM ) continue ;;
"#"* ) continue ;;
* ) err "Unknown parameter in snaps.config line: \"$x\" \"$path\""
esac
done

# what time is it?
secs=$(date +%s)

# see if there is any work to do
unset snap
unset curr
unset arch
unset rmrf
unset ante

scurve $secs `ls snapshots`
if [[ ${snap[0]:-NIL} -ne $secs ]] ; then
log "Too soon"
exit 0
fi

# for log
df $TO

# remove unwanted snapshots if any
for x in $rmrf ; do
log "Unlinking $x"
$now rm -rf snapshots/$x
done

# if we crashed before, get rid of the remains
for x in *.partial ; do
if [[ -d $x ]] ; then
print "Unlinking $x for $where:$TO on `date`" >> snaps.log
$now rm -rf $x
fi
done

# is there a previous version to use with link-dest?
if [[ "$ante" != "" ]] ; then
rsyncopt "--link-dest=$TO/snapshots/$ante${ROOT:+/}$ROOT"
fi

# rsync from the system into the new snapshot
log "$RSYNC $dry "${RSYNC_OPTS[@]}" "$ROOT/" $TO/$secs.partial"
$RSYNC $dry "${RSYNC_OPTS[@]}" "$ROOT/" "$TO/$secs.partial"

# move the snapshot into place
$now mv "$secs.partial" snapshots/$secs

# update the mtime of the snapshot to reflect the snapshot time
$now touch snapshots/$secs

# and thats it.

df $TO

log "Completed $dry"

$now ln snaps.log snapshots/$secs

finish 0

No comments:

About Me

My photo
Ignavis semper feriƦ sunt.