2008-07-21

Converting text to graphic files for Superlab

Since graphics files usually exist outside of the immediate Superlab scenario folder, and are not loaded until the experiment runs (and if the appropriate option is selected, not until just before a trial runs), it is possible to use an external script to re-randomize graphics files before running a particular subject.

The way it works is, you set up Superlab to use generic names for files, for example, "trial001.png", "trial002.png", and so on. You have your real stimuli in a different folder, with names like "elephant.png" and "COW.png". Then, when you are setting up for a particular experiment, you get rid of the dummy files and put in links in to the real stimuli, in the appropriate order, for example, trial001.png --> elephant.png ; trial002.png --> COW.png. Obviously, you have to keep the mapping around so that the logfile can be patched up after the run, by replacing instances of "trial001.png" with "elephant.png" and so on.

Obviously, you can't do this with text stimuli because text stimuli are internal to Superlab. I have written a utility to patch an existing scenario file to contain a different order of stimuli, but this method is very risky and isn't flexible enough. So the correct solution is to convert the text stimuli you want to use into graphics files.

There are many ways to do this, but one of the easiest to use from a scripting standpoint is called "a2png". This is a utility that can be downloaded from sourceforge. It needs either the cairo or the gdlib graphics libraries; one or the other also has to be installed for a2png to build. Once it has been installed, you can use .ttf font files to create .png files from text strings. By default, the image is cropped to the font's cell size, and has a black background. There are a number of options to change the background, foreground, size, font, spacing, and so on. The .png files can be used directly by Superlab on both Windows and Mac systems, or they can be converted to jpeg or some other supported graphics format.

By default, if you give a2png the name of a .txt file containing a stimulus, it will create a .png file in the same folder with the same basename. So for example, "a2png ... elephant.txt" should result in an output file "elephant.png". If you don't want all those *.txt files, a2png will also accept standard input if the file is "-", and will write to X in "--output=X". So, an alternative way to create elephant.png is "print elephant | a2png ... --output=elephant.png -".

How to specify the right font

Now, at least on my system, a2png doesn't want to find the ttf fonts. The built-in font folder list is a poor match for the fonts I have installed on my system. So, what I do is to give the whole path to the .ttf file I want to use. You can find all the appropriate fonts on your system with "locate .ttf | grep ttf$". Also, there are thousands of truetype fonts (ttf) out there on the internet. It's probably better to give the whole path anyway. I like a sans-serif font for displaying stimuli, such as free-sans, which can be readily downloaded.

You can also mix text and pictures, and randomly switch, for example, which kind is on the left or the right of the display, just by setting the link appropriately before running.

UPDATE

There appears to be some kind of glitch in a2png such that the cropping that it does removes the bottom of each character on the last (i.e., only) line. There is a workaround of suffixing a '\n', but this adds too much space.

There is a completely different approach available with the classic netpbm package. This command line:

print JjKg_Xq \
| pbmtext -font ~/Downloads/bdffont/100dpi/helvR24.bdf \
| pnmcrop | ppmchange white black black white \
| pnmtopng > foo.png

isn't too bad. An alternative is:

pbmtextps -font=Helvetica JJKG_XQ \
| pnmcrop | ppmchange white black black white \
| pnmtopng > foo.png

So, one way or another, there will be a way to do this. Frankly, a2png produces prettier output. The pbmtextps output is quite fuzzy, while the pbmtext output depends on having the bdf fonts available, and in turn, they have limitations on size. Since a2png uses the Cairo graphics library, it can use ttf fonts and scale them, etc., very prettily. Hopefully I will find a fix for a2png.

UPDATE2

It is true that a2png produces more attractive lettering, but as it turns out, there is a very real application for the netpbm package here: setting up the experiment template. The scheme I am trying to use involves setting up a single experiment with all of the trials indexing external event files, usually images or images of text. By changing the names of these external files, you can change the stimuli presented to subjects with none of the limitations imposed by superlab. So, what I've been doing is to generate dummy stimuli to be used while testing. These are graphics containing text strings that make it easy to identify the order and type of the stimuli for debugging purposes.

In one of the experiments I'm setting up now, there are 30 640x480 pictures. Here is the shell function I'm using to create jpeg dummy files for them:

jpg640x480(){
ppmmake lightblue 640 480 \
    | ppmlabel -x $((320-(5*${#1}))) -y 240 -size 10 -background lightblue -color black -text "$1" \
    | pnmtojpeg > $2 2>/dev/null
}

I chose black over lightblue so they would be very contrastive with the white over black text stimuli.

UPDATE3

Well, there is a fairly easy way to get images cropped correctly with a2png that will work until the program is fixed somehow: use the --no-crop option in a2png, and crop the result using netpbm. For example:

print Somejunque \
| a2png --no-crop -s --overwrite --font-size-0.1 --output=uncropped.png \
; pngtopnm < uncropped.png
| pnmcrop \
| pnmtopng > cropped.png

This yields the best of both worlds: flexible, high-quality text rendering plus correct cropping of the result.

2008-07-18

How to set up an sl4am project

Once the basic SuperLab experiments are running, the next step is to set up the hierarchy of files and folders. This can be done with a fairly simple script, given the names of the population and condition groups and the initial number of subjects in each cell. Options include setting up an empty Shared folder and links to it in each experiment folder. Empty rc files and flag.free files are created everywhere. Another option is to clone a new population group from an existing one; another is to look for empty rc files (this would be the sign that an external setup script didn't do its job completely).

However, once the hierarchy is all set up -- and it probably is a good idea to set up only the Try population group first, and clone the other groups from it -- the next step is to populate all of the experiment folders and to put actual code into the rc files. The best way to do this is to write a custom script. This script could use find(1) and be driven by the existing factors as an organizational approach.

Update:

Just stumbled across the automator(1) command. This should be very useful for running experiments, since it is a way to invoke Automator workflows from the command line. There are options to set variables and to pass input, including standard input, to the workflow. It is less clear to to take output from the workflow, probably temporary files will be needed.

A Useful Experimental Data Hierarchy


This is a scheme for storing experimental data, primarily with SuperLab, that should be easy to setup, easy to use, and easy to automate.

At several specific points throughout the hierarchy there are rc files: rc.sl4am, rc.proj, rc.subj, and rc.exp. These can contain arbitrary ksh code and can change the operation of sl4am, but are intended to customize the operation of sl4am by changing specific parameters or by altering the setup, runtime, and/or cleanup phases of a SuperLab scenario.

The SL4AM subfolder within the SuperLab Experiments folder is the default sl4am home folder. If an rc.sl4am file is present, it can redefine the variable SL4AMHOME in order to use a different home folder, for example, in a shared location on the local machine, like "/Users/Shared/Documents/SuperLab Experiments/SL4AM", or on a remote location, such as "/Volumes/groups.ourlab/Documents/SuperLab Experiments/SL4AM". If there is a separate root for the status flags and datafiles, that can be defined in rc.sl4am as "DATAROOT" (see below). If DATAROOT is defined, it should give the path to a folder that will be organized in parallel to the SL4AM folder.

Note that spaces in folder and file names should avoided if possible below SL4AM. I'm trying to make the script immune to problems resulting from spaces in filenames, but it's safer (and much wiser) to avoid them.

Project roots are top-level subfolders in SL4AM. There can also be a top-level subfolder in SL4AM called Archival that is intended to store completed or inactive projects.

Except for some code at the beginning to select a project, sl4am is concerned only with the world within a single project root, and in fact, it uses cd to go there as soon as possible so that most paths within a project are relative to the project's root. While running SuperLab, sl4am changes temporarily into the folder of the specified scenario (see below). At the top level of a project root, there must exist a file called rc.proj that is sourced when sl4am starts a session in that project. A subfolder X of SL4AMROOT is a project root iff X/rc.proj exists.

Each project contains a subject hierarchy. The top level is a dot-separated list of population.group folders, for example, Healthy.Elderly, Healthy.Young, Schiz.Elderly, Schiz.Young. Below each population.group folder is one or more condition.group folders, such as Set1.VisualFirst, Set1.AuditoryFirst, Set2.VisualFirst, Set2.AuditoryFirst. Below each condition-group folder is one or more numeric subject folders, for example 1, 2, 3. There is one subject folder for each subject to be tested in the project. Each subject folder must contain a flag file. It may be desirable for sl4am to create new subject folders automatically as needed; in any case, the names of these folders are simply numbers starting at 1. There also needs to be a way to insure that sl4am will test the first subject in each condition before starting the numerically next subjects, and so on.

Note that even though some labs never test any population group other than college students, both group levels are required. The best way to handle this situation is to use two population group identifiers, one named "Try" and the other something like "YN" (young normal). The Try group is for testing the experimental setup, while the YN group is a reasonable label for college students. If at some point you need to add another population group, it will be very easy to do it. The presence of the "YN" level won't interfere with anything, and the "Try" pseudogroup is very useful for development and for training RAs.

The flag file is a simple advisory access-control mechanism. When an experiment is first set up, each slot's flag file is named "flag.free", indicating that any computer can run that subject slot. The file is renamed to "flag.user@en0", where "user" is the current user and en0 identifies the current host. (The figure says "fern", which is the name of one of our computers; this method is too variable.) This "checks out" the subject slot to the named individual. The parameter "en0" is set to the ethernet address of en0 (ifconfig en0 | grep ether | read junque ether junque) in order to identify the machine in a somewhat unambiguous way. An individual may relinquish a subject slot by renaming the flag file back to "flag.free", but the presence of datafiles will cause a multi-experiment sequence to pick up where it left off before. When a subject slot is complete and all data has been saved, the flag file must be renamed to "flag.completed".

Note that if DATAROOT is defined, then sl4am will search there for flag.free slots, change to flag.user@en0 there, and will upload data there. The structure under DATAROOT is identical to the SL4AMHOME hierarchy, but it need not have any Shared folder, rc files, or scenarioes or stimuli. In the event that subjects must be tested offline, one or more subject slots should be checked out in advance. When sl4am starts up, it will sweep all experiments looking for the flag "SYNC" in a subject-number folder. If it finds any, and if DATAROOT is accessible, it will synchronize the parallel structure there by uploading any data not already present, and renaming the flag file if necessary. Any time sl4am tries to upload data or change the flag file but DATAROOT is inaccessible, SYNC is created. After a successful upload, it is removed.

There also must be an rc.subj file in a subject-number folder, primarily designed to control running multi-experiment projects (for example, the order in which to run the experiments). This can be a zero-length file.

For each experiment, there is an experiment root under the subject-number folder. This folder contains all that is needed to run one SuperLab experiment: the scenario (.sl4) file; all stimuli needed (these will generally be links elsewhere, or in the case of all-text experiments, missing altogether); and the datafile (.txt) once the experiment has been run. There is also a mandatory rc.exp file, primarily intended to help customizing the SuperLab run or the datafile processing afterwards. For example, this could give the subject some feedback about his performance. Sl4am cd's into the experiment root before running rc.exp and SuperLab.

Under each project folder, there can be an optional Shared subfolder. This contains all shared stimuli and/or sl4 files. For convenience, the setup script will install a Shared symbolic link in each experiment root that points back to the project-level Shared folder (if it exists). To link to it from a stimulus folder in an experiment root under a subject file, just use

    ln -s ../Shared/SomeFolder/somefile.xxx stimset/somename.xxx

To link a scenario file in the experiment root to one in the Shared folder, use

    ln -s ./Shared/some-exp.sl4 some-exp.sl4

It is possible to fully populate this tree before beginning to run, but it is equally possible to use the rc scripts to populate fill things out at run time.

UPDATE

It really isn't hard to link relatively back to the Shared folder without the klutzy locate Shared link. Here's how to do it. You create a dummy tree all the way down to the stim/stimset folder inside an experiment. Also, create a Shared folder with a stimset/xxx.png in it. Then cd down there and do "ls ..", "ls ../.." and so on until you get to (e.g.) "ls ../../../../../../Shared/stimset". The test it with ln -s ../../../../../../Shared/stimset/xxx.png yyy.png" or whatever. Once you've done this and gotten it to work, just make sure that your script does two things: use the right number of dotdots, and actually cd into the stimset folder to do the linkage:

(cd downpath; ln -s uppath/Shared/stimset/file.suff link.suff)

Note that assuming the script is running in the project root, downpath will be like "pgrp/cgrp/subj/exp/stim/stimset", and uppath will be like "../../../../../..". As for why this is worth doing instead of just using absolute links, it's to allow experiments to be installed using simple methods like tar. With relative links, no adjustment is required, with absolute links, basically there would need to be a script run to adjust links after installation in new locations.

2008-07-15

Subject checkout on shared volume

In our lab, we have five macbook pros that could theoretically be used all at once to test subjects in a single experiment. In the past, we have gotten into trouble when, due to experimenter error, a certain subject slot has been run on more than one computer. To get past that problem, we want to use a shared volume to contain the experiment setup hierarchy, and come up with some way for all of the computers to share that hierarchy. Obviously, there must be some method to prevent two computers from trying to use the same resources. The simplest way is to set a lock at the filesystem level, marking the subject as "taken", and releasing the lock. However, the most straightforward way to do the sharing, using one of Apple's group iDisks, has no locking mechanism. You can't even make files read-only. The lockfile(1) program, when asked to create a lockfile on an iDisk, gives up and suggests praying instead.

I did come up with a locking mechanism. What you do is to use a reserved folder. A computer that wants to lock the resource waits until the folder is empty, then writes its ID into the folder (the ID could be, for example, the ethernet address of en0). After a short delay, the computer then checks to see if there is exactly one file in the folder, namely, it's own ID. If so, then it has the lock. If there is more than one, then it removes its ID, waits a short but random period, and tries again. The only problem with this mechanism is that it is very slow, on an already slow filesystem like the iDisk.

After pondering this for a while, I thought of another approach. Instead of setting a lock before accessing the subject slot, you randomly choose the "next" subject to test, and then rename it to a name with your ID. For example, if the subject is called "12", and if your ID is aa.bb.cc.dd, then you would simply "mv 12 12-incomplete-aa.bb.cc.dd". Then wait a short time and see if "12-incomplete-aa.bb.cc.dd" exists. If it does, you now own subject 12; if not, try again. (If the locked name doesn't exist, it means that a race occurred and another computer locked it between the time you found it and the time you did the mv command.)

The random selection is somewhat important, but not critical. If you just go in a fixed order, all it means is that there is slightly greater probability that a given computer will have to try more than once to get a subject.

Once the subject is locked, testing proceds. When it is complete, the name is changed again to, e.g., "12-complete-aa.bb.cc.dd". Note that it is still locked, in a sense, since it will not appear in the list for testing.

One other brief note: it might make sense for each subject on the remote volume to be an archive, for example tar.gz format. This would facilitate copying it onto the macbook pro. A question to be resolved is whether data is place into the archive or somewhere else on the remote volume.

2008-07-09

SuperLabAutoMator: superlab + automator

We use Superlab 4 for some experiments we do in the lab, but it almost always seems that we need fancier randomization/counterbalancing than the program provides out of the box. Also, the dialog that the RAs must go through, to deal with subject and group IDs, different scenario files for different conditions, and the right name to use for the logfile, have resulted in errors and lost data in our lab. The traditional solution for this is scripting, and in the Macintosh world, many user-oriented scripts make use of the Automator utility. I'm currently setting up an experiment that requires a specific randomization and counterbalancing across three different procedures for 24 subjects. What I intend to do is to make a shell script embedded in an automator script called "SuperLabAutoMator" (or "slam" for short) that will do this in a generalized way. What superlabautomator does is to pop up a window asking to select from an experiment (all must be a subfolder in a standard folder, or if not there, in the same folder as the script). It then follows the instructions in the experiment subfolder, by running "prescript", "midscript",  and "postscript", which are functions defined in the script.

In the experiment subfolder, there is optionally a file called "rc.slam" that can define the following objects:
  • name=xxx -- the subject ID to use (default = null)
  • group=xxx -- the subject's group (default = null)
  • scenario=ppp -- default is "scenario.sl4" in the experiment subfolder
  • logfile=ppp -- default is "logfile.txt" in the experiment subfolder
  • fifofile=ppp -- use as Superlab's logfile; filtered data should be written to $logfile by midscript
  • prescript() -- set up to run
  • midscript() -- interact with Superlab while running
  • postscript() -- run after midscript and Superlab have finished
All of these have default values. The rc.slam file will be sourced early, before asking for the subject ID, for example. This list could expand or shrink.

Midscript is to be run while Superlab is running, and it can either fiddle with the logfile, or prescript could create a named pipe to be used by Superlab as the logfile and open it for reading by midscript. The purpose of midscript is to handle cases where the actual stimuli to be presented must be changed as a function of performance. In most cases, it will not be needed. Note that if the midscript is reading the logfile from a named pipe, it should also save the raw contents. One way to do this might be to use the tee(1) command; alternatively, midscript can simply save each line it reads. One unresolved problem is that Superlab brings up a user confirmation window if the logfile already exists, as it must for a named pipe. It would be nice to override that somehow.

The core of Superlabautomator will run Superlab from the executable file rather from the GUI, so that command-line options can be specified. After changing into the experiment subfolder, it will source "rc.slam".  Next, it will call prescript, and wait for it to finish. It will then set Superlab to run in the background, bring Superlab into the GUI foreground, and call midscript. After midscript completes and Superlab exits, SuperLabAutoMator will call postscript before exiting.

In general, it is a good idea for Superlab at a minumum to wait either for several seconds before the first trial, or more typically, to display an instructions screen and wait for a response.

From the RA's point of view, all that will be required is to start SuperLabAutoMator, choose the correct experiment (only active ones should be available, and if only one is available, only a confirmation window comes up); choose the subject group from a short list (if more than one); choose the subject (only untested subject numbers will be available). The script will take care of setting things up for Superlab, running it, and dealing with the data, including filtering it, giving some feedback to the subject, and possibly storing it away in a centralized database.

When I get this running with the first experiment (no runtime interaction will be used, btw), I will post the SuperLabAutoMator app and the setup of the first experiment.

Note: while slam is a great name, it is already in use with a couple of different programs/utilities, so we will go with the GUI name SuperLabAutoMator and use slam as an internal shortcut (as in the rc.slam filename), and we can also pronounce the name optionally as slam.

2008-07-01

Sending email to root

It is pretty important that the root get asynchronous notification of problems detected by the maintenance system. However, it is obviously impossible to sent email in single-user mode (sendmail not running, boot drive write-protected). Therefore, there are two methods that could be used to notify root of errors or just of system status. The basic idea is that the message be written someplace other than the boot drive and then mailed in single-user mode via a launch daemon.

First, when a backup is done, it will be done to a writable medium, namely a firewire drive. This drive will be mounted when the system is running, so a message posted in /Volumes/Snapshots can easily be sent on to root. This will be the main method of notifying root about snapshots and clones.

Second, there are times when Snapshots isn't available. In fact, one of the critical messages might be that is wasn't available so no snapshot was made. In this case, the best method is to use /var/log/system.log. The system startup saves the standard output and standard error from rc.server (and other boot programs) in system.log. This is in fact a more reliable way to send notifications.

The approach will be simply to write out a banner lines like "org.bogs.rootmail begin" and "org.bogs.rootmail end" so that the daemon can simply extract the intermediate lines (if any) and mail them to root. In general, this should be limited to the bare minimum of lines. By default, system.log is cycled at midnight, and eight gzipped copies are kept around, which should be more than ample.

One thing this does is to complicate the daemon. Not only must it write out org.bogs.maintenance-mode, but it now has to send mail to root. This will require something to be run immediately after entering multi-user mode as well as something when it is time for more maintenance. Also, some kind of flag must be used to prevent multiple mailings, but I don't know what it should be. Maybe just a file ~root/.org.bogs.rootmail containing the timestamp of the last "org.bogs.rootmail end" line that was mailed out would be sufficient. That is, when the mailing script is run, it will send only more recent segments of system.log, and it will update the flag file.

Where to put maintenance scripts

This is slightly complicated, because system areas are of course sometimes overwritten by software updates. Here are some ideas.

I want to keep the changes to /etc/rc.server as minimal as possible. So I will add two lines at the very end of the file that will do two things in a conditional:

# org.bogs.maintenance
if [ -e /private/tmp/org.bogs.maintenance-mode ] ; then source /var/root/Scripts/maintenance.sh ; fi

The file /tmp/org.bogs.maintenance-mode is used to pass parameters to ~root/Scripts/maintenance.sh. If it is not present, then this is not a maintenance boot. If it is present, but if the maintenance.sh script is missing, an error message "no such file or directory" will be written to the log. Note that maintenance.sh runs in the same bash environment as rc.server. Also note that ~root is already a protected place, and Scripts should also be protected (mode 700).

Also note that /tmp will be erased at some point after the maintenance has completed and the system comes up multiuser.

The file maintenance.sh should not do very much, its role is to call other scripts or programs located in the same directory based on the contents of org.bogs.maintenance-mode.

All of the maintenance code will be maintained elsewhere, on another system, and be copied into a directory on the server and installed from there into /var/root and /etc/rc.server. In addition, the installation script will add or replace the last two lines of rc.server, and will also add the appropriate material to /Library/LaunchDaemons (which is where "system-wide daemons provided by the administrator" are supposed to go).

About Me

My photo
Ignavis semper feriƦ sunt.