OS/X Lab Scripts

2009-07-07

/etc/profile

The system-wide sh-class shell initialization file can be very useful, but there are some potentially confusing aspects of how it is used in different shells. The goal is to have a reasonable version of /etc/profile that can be used for all users.

Classic sh shell.

The Bourne shell as described in the BSD 4.4 User's Reference Manual distinguishes between "interactive shells" (stdin is a terminal or -i flag was used); "login shells" (0th argument begins with '-' (e.g., "-sh"); and other invocations. Login shells evaluate /etc/profile and .profile if they exist, non-login shells skip this step. Then for every shell invocation, if the environment ENV is set, its contents are interpreted as a path that is then evaluated. Note that for non-login shells, ENV must already be in the environment; for login shells, it may be set in one of the profiles. Interactive shells can be identified by using case $- in *i* ) ... ;; ... esac.

Bash shell.

This is the default OS/X shell and is the most widely used descendant of the Bourne shell. It behaves differently when it is invoked as "sh" or "bash". In the former case, its startup is intended to emulate that of the classic sh (note that this mode is used in single-user mode and in many shell scripts intended to be widely compatible). For bash, an interactive shell is one whose stdin and stdout are connected to terminal, or if the -i flag was used. A login shell is one whose arg0 starts with - ("-bash", "-sh"), or where the --login (or -l) flag was used. When bash is invoked as "sh", it first evaluates /etc/profile and then ~/.profile unless --noprofile is given. Note that the --login can be used even with "sh" invocation. At this point, "sh"-invoked bash enters "posix mode" (the --posix flag can also be used for this purpose). In posix mode, ENV is handled as with classic sh. When bash is invoked as "bash", it also evaluates /etc/profile, then the first existing file in the set ~/.bash_profile, ~/.bash_login, and ~/.profile (unless --noprofile was given). Interactive, non-login bash evaluates ~/.bashrc, unless --norc is given. Non-interactive bash evaluates $BASH_ENV if defined. Note that for interactive bash shells, $- will include i and PS1 will be set. In bash, the following variables will be set by the shell: BASH, BASH_VERSINFO (array), BASH_VERSION.

Korn shell.

This is an excellent extended version of sh which differs from bash in various ways. Ksh defines interactive the same as bash, but it has no effect on the startup files used. Login shells are defined as for sh: arg0 must begin with '-' (e.g., "-ksh"). Login shells evaluates /etc/profile if it exists, and then .profile or $HOME/.profile, if either exists. As for ENV, it is handled the same as classic sh, except if it is not set, $HOME/..kshrc will be evaluated if it exists. If the real and effective uid or gid do not match, /etc/suid_profile will be used instead of ENV or HOME/.profile (interactive shells). Also, in ksh $- contains i. In ksh, the variable KSH_VERSION will be set by the shell.

Single user mode

On standard UNIX-style systems, either /bin/csh or /bin/sh are used in single-user mode. If /bin/csh, we are already forked, but if /bin/sh, then there are consequences for /etc/profile, because it will generally by evaluated in single-user mode (in the current launchd under OS/X, it is invoked as /bin/bash, with arg0 set to "-sh"). Functionally similar invocations are probably the norm.

Some conclusions

Basically, /etc/profile will be evaluated for all logins. If ENV is set, then in some cases but not all, it will be evaluated, and of course there are some other shell-specific files that also will be evaluated in some cases, that we aren't concerned with here. There is no simple test to detect the currently running shell. One can use $0, but that doesn't distinguish true sh from one of the others masquerading as sh. However, that may not matter in many cases. Therefore, a simple case statement on $0 will work in most cases in /etc/profile. The situation in ENV is more complicated, because there could be an unknown amount of environment setting (e.g., for PS1) before ENV is run. In one case I know of, the login shell is ksh, and it is detected correctly, and then ksh-specific material is placed in PS1. If bash is then run interactively from the ksh login session, it *inherits* PS1. The fix is to put stuff in ~/.bashrc to set up PS1, or to do other things where bash and ksh differ.

2009-06-22

Scripting single-user mode

As I have written earlier, it was possible to add commands to /etc/rc.server, and they would be executed in a context very similar to single-user mode. However, with the the 10.5.7 upgrade, /etc/rc.server was moved to a later point in the boot sequence, to an environment more similar to ordinary multi-user mode. So not only is the context different, but this indicates how fragile the whole /etc/rc* vestige is in OS/X. A new method is required.

The best alternative I've come up with is to use actual single-user mode. It is possible to get into single-user mode from a script via this sequence (executed as root): « nvram boot-args=-s » ; reboot. At some point once single-user mode is entered, the command « nvram boot-args= » must be run in order to re-eneable multi-user mode.

There is a script that is executed by the shell, and that can be hooked for the purpose of scripting maintenance in single-user mode: /etc/profile, the shared, system-wide start-up file for all shells in the sh family. However, since this location can (and should) be used to customize the shell environment at the system level for all users, it should be changed as "invisibly" as possible.

I prefer to deal with these issues as follows: I'll put one line at the top of /etc/profile that contains some fast heuristics and slower deterministic tests for single-user mode which if passed result in a call to jidaemon (which is the script I want to run in single-user mode). The presence of this line at the top of /etc/profile is required. It can be checked by comparing [[ "$THELINE" == `head -1 < /etc/profile` ]]. The heuristics should all be based on the shell's internal environment, and should be as fast as possible, because /etc/profile is called every time the shell starts up. The heuristics are UID=0, HOME=""; if those are true, the deterministic tests are `sysctl -n kern.singleuser`=1 and -x /var/root/jidaemon. If those are true, run /var/root/jidaemon. Within jidaemon, all those tests are repeated, and some additional tests are run: nvram boot.args == *-s*, read-only root, -f /tmp/just.imagine and so on. Also, if -s is set in nvram boot.args, jidaemon must clear it while preserving any other flags. If any of these tests fail, then jidaemon returns to caller and the only result (beyond clearing the boot.args -s flag) is a slight delay--the shell will continue and an interactive single-user mode session will begin. If jidaemon runs normally, it will restart the system when complete.

2009-01-27

Importing SSL certificates on OS/X leopard server

I'm not going to go through the whole process, which is well-documented elsewhere. Basically, you buy & download the ssl.crt (certificate/public key), ssl.key (private key--I go passwordless, but YMMV), and ca.pem (certificate authority) files and then click on "Certificates" in Server Admin, browse to their locations, and install them. My problem was related to the fact that last year, when I first got certs from startcom, their master ca was not listed in the standard list of signing authorities on the server. I tried a lot of ways to get around that, and eventually got it working without really understanding why. My trick was to install the certs manually into /etc/certificates and use "custom configurations" in each ssl service. Recently when I had to renew the certificate, I had to revisit the whole mess. When I tried to import the renewed certificates, I put them into /etc/certificates as before, but after each reboot, the old ones would keep getting written on top of them. This undoubtedly was happing last year, but I didn't realize it because I only had one set of certs. I eventually decided that the only place the old ones could be coming from was the system keychain.

I looked in the system keychain and tried to install the new ones there, but kept getting an error saying the the identity already existed.

It turns out that in fact, the server copies certificates from the keychain into /etc/certificates at boot time. I hadn't known this. When I deleted the certificates from the keychain, everything "just worked" after I installed the new certs into /etc/certificates. The missing piece of the puzzle was the server scribbling in /etc/certificates.

Chapter three of this (in progress) is that now the startcom signing authority cert is in the server's default list. I verified this on a new install of the server software--on that system, the standard Server Admin approach works flawlessly, no direct access to /etc/certificates is needed at all. So, the next step on the older system is to turn off all ssl services (at least iCal, iChat, Mail, OD, RADIUS, VPN, and Web), clean out /etc/certificates, and install the up-to-date certs into Server Admin. Then, go through each service and ditch the custom configurations, replacing them with the standard wildcard cert installed normally.

OK, I think I've done this successfully: it seems to be working. So, the comment about the server scribbling in /etc/certificates no longer is relevant to my particular configuration, but it is very relevant to someone who has a custom configuration. My advice: go ahead and put the certs in /etc/certificates, but (1) don't name them either Default or the address certified (e.g., *.domain.net), and (2) make sure they are NOT entered in the keychain as well. One or the other, please.

2008-11-15

Note on using webdav idisk for experiment data

One of the problems I had before with setting up a script to manage the experimental scenario and data on a webdav idisk was that I hadn't stumbled across how to automatically mount & unmount the filesystem. It turns out that this is very easy.

First, create a directory to use as the mountpoint. For example, assuming you are in a writable directory, use something like this:

mkdir mnt

Next, use a command of this form:

/sbin/mount_webdav -s http://idisk.mac.com/groups.labname mnt

("groups.labname" should be replaced with the actual name of your idisk)

If the login info us not in your keychain, it will ask you for it. You might consider putting in the keychain for convenience. Note that any user (i.e., any RA who will access the database) must have access to the idisk.

Note that it is possible for a subdirectory to mounted directly with a command like this:

/sbin/mount_webdav -s http://idisk.mac.com/groups.labname/Databases/Thisdatabase mnt

However, in this case, a separate keychain entry will be needed to log in. It would normally be simpler to mount the root of the idisk and navigate its hierarchy programatically.

When complete, a simple "umount mnt" will be required.

Note that this method will work even if the idisk had already been mounted in the standard location or somewhere else.

2008-11-13

Combining image and rsync backups

This is from a note on the Apple server list.

The procedure is to start out with an asr image of the root volume using the -erase flag. This is now a bootable volume.

Then, each night, perform an rsync on this volume, using -delete and other flags so that all changes are written to the volume. This is now an updated, but still fully bootable volume.

This is almost equivalent to doing a nightly clone. It is food for thought. Here are some related issues:

Unless you want to bring the system down to single user mode every night, you are still going to have to deal with open database files during the rsync.
Assuming that those files have been dealth with, would it be faster just to do the image dump every night?
Is the target system more likely to be bootable in the even of a catastrophe mid-copy if rsync is used?
One of the biggest advantages of using rsync is the ability to do snapshots. Clearly, the original image could be used as the starting point for nightly snapshots, but then the image itself would never be updated. Is there some way for the root hierarchy to be the most recent snapshot, with previous versions stored elsewhere? (See below)

It seems to me that what might be needed here is for there to be two rsyncs each night. The first one is to update the bootable image from the working image, as in the original hint. The second one, to be done when the first one is finished, is to make a snapshot of the bootable image.

In other words, the backup would go like this:

Before any backup, go through the process of dumping all system databases.
For the first (labor intensive) backup, use asr to make a complete copy of the base system. The system should be as quiesent as possible for this, and the system database dumps should contain the current database contents. The backup volume should be considerably larger than the working volume.
Dump the databases and use rsync to update the smaller backup volume. Be careful to exclude the /snapshots directory on the backup volume; this should be untouched by this run of rsync.
Then use rsync again to create a new snapshot of the non-snapshot regions of the backup volume (there is no need for an additional database dump). This will be stored in /snapshots on that volume (and so /snapshots will again be excluded).

Both runs of rsync could be run at fairly low priorities.

Note that if this were to be done over the network via an rsync server, the second (snapshot) rsync run could be done locally on the server. Not sure if that would be worth it, though. For now, I'm assuming an attached drive.

Also, it would be possible for /snapshots to be a different drive. For example, with two internal drives, the second drive could be the mirror of the first, with a larger external firewire drive for the snapshots. In a fairly non-intensive application like in our lab, this use of the second internal drive would probably be better than mirroring RAID, which is how it is being used now.

In any case, we would want to avoid spotlight on the backup drives, and also we would want them to be mounted read-only except when actually being written during backups.

A variation of the above would be to split drive 1 between the backup-able part and a large non-backupable part, so that less space would be needed on drive 2.

2008-09-12

OS/X single-user-mode backups

I've been trying to take advantage of the script /etc/rc.server to do full backups of the boot drive. This file is present in OS/X server, and it can be added to non-server systems. Basically, the script is run via /bin/sh early in the boot process, at a time similar to single-user mode. Only kernel drivers are present, which means that the internal harddrive and firewire drives are available, but not (at least not yet) USB drives. The boot drive is semi-mounted, in read-only mode. Semi-mounted means that its device is listed as "root_device", not as an actual drive.

The advantage of bringing a server down on a regular schedule for backups is that there are no open files, and the entire system drive is unwritable. This maximized the thoroughness of the backup. Furthermore, there are some cases where backup programs such as asr(1) can use more efficient techniques for read-only drives than for read-write drives.

The are huge disadvantages, though. First and foremost is that while the server is down for backups, it can't provide whatever services it is responsible for. In the case of our servers, that's at least DHCP, DNS, OD, and file and web services. However, in a situation such as in our lab, where the amount of data is relatively small (probably less than 20-25 GB), the down-time will not be excessive, probably an hour or less.

The second class of disadvantages is almost a deal-breaker. OS/X has chosen to implement so much of its device and file-system interface code in terms of user-mode "frameworks" rather than kernel-mode drivers, hardly any even of the command-line utilities are available for use in single-user mode! For example, diskutil, the main filesystem tool, is unavailable. Disktool doesn't work any better. Hdiutil will do some things, but cannot attach images or use the -srcfolder mode. It turns out that the best tool of the ones that will actually work in single-user mode is asr.

Asr works far faster when it is in "device" mode. In order to enter device mode, the target drive must be greater than or equal to the size of the source drive, the "-erase" flag must be used, the source drive must be mounted in read-only mode. Asr is only for hfs-format drives. There is also "copy" mode, which is also fairly fast; it is used when device mode's criteria are not met.

The biggest problem with asr is that it copies everything, including the volume label, and there is no way to avoid this. Therefore, the copy ends up as a filesystem named the same as the boot drive. This can cause confusion. One would think that the solution would be to simply change the volume name after the clone operation. However, due to the impoverished runtime environment of single-user mode, there is no tool available to do it. The pdisk(1) program has a partition-labeling option that does work in single-user mode, but it turns out that this is not the same thing as the hfs volume label. When the system comes up, the clone will be mounted under the same name as the main drive with a " 1" suffix, like "Macintosh HD 1". If there are several backup partitions and nothing is done, they will all end up with the same name, like "... 1", "... 2", etc. (See update below.)

This can cause considerable confusion. The only "solution" I've been able to come up with is to write some information into the /tmp directory regarding the backup itself, and then once the system comes back up, use diskutil to rename the volume accordingly. A good way to do this is in crontab, with the "@reboot" time indicator.

As for a general strategy, it is important to have at least 2 backup partitions (2 drives would be better), so in case something bad happens, the previous backup would be available. Also, the backup partitions should be larger than the system disk.

At present, my servers both have 250GB Raid mirrors, and I have a 300GB firewire drive for each of them. As an initial test, I will simply do a single asr backup of the main drive onto the firewire drive--this is nearly as safe as having two partition on the firewire drive. Later, I'll get another firewire drive for each one, and swap the drives.

UPDATE

Here's a kind of strange way to get around the problem of the volume name. Instead of trying to change, ex post facto, the name of the clone, why not change the volume name of the system disk? This can be run very early in the launchd process, and all it takes is "[sudo] diskutil rename / newname". Obviously, the name will have the boot time in it, or, more difficult, a sequence number related to the backup system. I think a timestamp of the form: 200901171402 would be good. As for the rest of the string, why not simply look for the current name (you can get this from the "list -plist" diskutil command). If the last 12 digits of the current name (which will be identical to that of the most recent backup) are digits, then they will be replaced, otherwise nothing will happen. If there is a reasonably short system name, like "Lab 13", then why not name the root drive something like "Lab 13 System Disk " ? Each time the system is rebooted, the timestamp will be updated. The volume names of the clones will contain the timestamp of the previous boot. Since the root volume, unlike other mounted volumes, doesn't get mounted as /Volumes/VolName, the change of name will have no effect on paths, etc.

Plan 2: A slightly more satisfactory but more risky way to do a similar thing would be to change the name of the root volume (e.g., to "Lab 13 Clone 200901171402") before restarting the system for an automated backup. The name would be changed always to a constant (e.g., "Lab 13" or to the part before " Clone...") during the early the boot process.

In either case, there would need to be three scripts:

The setup-and-reboot script, which would search for the target volumes, decide which one to write to next, and if there is one, to store that information in /tmp/rc.autoclone or whatever. This script will change the name of the root volume just before rebooting under plan 2.
The clone script, which would expect information in /tmp/autoclone that if it checked out will control the clone operation.
The cleanup script, to be run very early in the boot process, to change the name of the root volume back to its normal name. In plan 1, the name will contain a time stamp, in plan 2, it will be an ordinary name like "Lab 13".

The first script can be run either from the command line or from a launchd/crontab entry. The second script must be run at the end of/etc/rc.server, and it must contain safety checks to prevent writing into the wrong medium. The third script would ordinarily be run very early in the boot sequence from launchd/crontab.

As always, the most dangerous possibility is that the clone will be written into the wrong place. Since diskutil doesn't work right in single-user mode, probably the safest way to handle this is to write a check string into the destination volume, for example, a file in its root directory called rc.autoclone identical to the one in /tmp. Also, there must be a directory created in /tmp called "mnt" as a place to mount the destination volume, so that /tmp/mnt/rc.autoclone can be compared to /tmp/rc.autoclone.

So in rc.server,

Check for /tmp/rc.autoclone
Check that the timeout interval has not passed (e.g., 5 minutes)
Check for /tmp/mnt
Get info on all current drives
Search for the drive indicated in /tmp/rc.autoclone
Mount the drive on /tmp/mnt and compare its version of rc.autoclone
Unmount the target drive
Perform the clone operation
Done

In the after-boot crontab (plan 2),

Get the current / volume name
If it needs to be changed, change it

The setup script does:

Do general checking to prevent being called at the wrong time (e.g., too soon)
Make sure that at least one target volume is available
Look through all possible target volumes to find the one with the oldest timestamp.
Compute its ID string
Create an rc.autoboot containing the ID string
Write the rc.autoboot into the root directory of the target volume
Reboot

2008-08-08

An example experimental project

This project, which we call "picolf6" consists of six experiments, four of which use Sensonics odor labels as stimuli. In addition, the counterbalancing and randomization is within each of the two sets of three experiments. Therefore, this was done by scripts rather than within Superlab.

One discovery we made about Superlab in the course of setting up picolf6 was that any time that a file system path is stored within Superlab, all symbolic links within the path are expanded and the path converted to an absolute path. This complicated several aspects of the process.

There are three types of stimuli used: pictures (stored as .jpg files), words (stored as .png files) and ticket numbers (stored as .png files). If you will refer to my recent blog entry on the sl4am hierarchy, you will see the layout we used here. Under the project directory, there is a Shared directory containing all of the .sl4 Superlab scenarios, and folders for all stimuli. The general idea is that the folder for each experiment (with subjects and groups) will have a link back to the scenario for that experiment in Shared, plus a "stim" folder containing links back to specific stimulus files within Shared.

The odor labels we used were mounted on standard 1"x2" event tickets and torn off one at a time to be "scratched and sniffed" and given to the subject. So, one of the things required is for there to be an inobtrusive ticket number displayed on the screen at the beginning of each trial. In superlab, the only way to do that is to make a graphic with the numbers in the corners and then specify that superlab scale it to fill the screen. We used the least significant digits of the tickets for this, so experiment 1 used tickets 1-30, exp 4 31-42, exp 5 43-54, and exp 6 55-66. Since this was always the same for all subjects, we set up folders in Shared named ticket[1456] and stored the .png files with the images there, named, e.g., 23.png.

One thing we learned the hard way was that it is best to set Superlab up so that the scenario is in a file hierarchy identical to where the experiment will be run. So for example, we simply specified /Users/.../Shared/ticket1 and Superlab could use the same relative path at runtime and find the stimulus folders.

The other stimuli were more complicated, because they had to be different for each subject. Superlab always accesses stimulus list folder contents in alphabetical order, so while setting up the experiment, we set up dummy folders containing files with names like w/img34.png (for word image event #34). Later, when setting up the hierarchy, we put links with those names to image files in Shared. So if for example on trial #34, a certain subject needs to see the word "alligator", stim/w/img34.png would be a link to ../../../../../Shared/words/alligator.png (for example).

Now, Superlab doesn't print the name of the file in a stimulus list folder, only its sequence number. So in order to figure out which stimuli were presented to a give subject without going back to the setup data, we inserted a dummy text event into each trial. These events were simply numbered like "@=1-34=@" for experiment 1, trial 34. As part of the setup sequence, we created a sed script that we placed into each experiment folder that translates that dummy code into a condition code for that trial, for example "alligator:old". This allows the actual stimuli that were used to be determined.

We also placed a link to a superlab scenario file in Shared into each experiment subfolder. However, we were not able to use symbolic links for this, since superlab then assumes that the scenario file is actually in the Shared folder and all of the subject-specific links fail to work. As a work-around, we just used hard links for it, since it is a file rather than a directory (file system rules disallow hard-linking to directories).

Here is a tar archive (tgz) of the Korn shell scripts I used to set this experiment up. Note that you also will need to install some packages to generate the graphics versions of the textual stimuli. I also didn't include the picture stimuli we used.