2008-07-18

A Useful Experimental Data Hierarchy


This is a scheme for storing experimental data, primarily with SuperLab, that should be easy to setup, easy to use, and easy to automate.

At several specific points throughout the hierarchy there are rc files: rc.sl4am, rc.proj, rc.subj, and rc.exp. These can contain arbitrary ksh code and can change the operation of sl4am, but are intended to customize the operation of sl4am by changing specific parameters or by altering the setup, runtime, and/or cleanup phases of a SuperLab scenario.

The SL4AM subfolder within the SuperLab Experiments folder is the default sl4am home folder. If an rc.sl4am file is present, it can redefine the variable SL4AMHOME in order to use a different home folder, for example, in a shared location on the local machine, like "/Users/Shared/Documents/SuperLab Experiments/SL4AM", or on a remote location, such as "/Volumes/groups.ourlab/Documents/SuperLab Experiments/SL4AM". If there is a separate root for the status flags and datafiles, that can be defined in rc.sl4am as "DATAROOT" (see below). If DATAROOT is defined, it should give the path to a folder that will be organized in parallel to the SL4AM folder.

Note that spaces in folder and file names should avoided if possible below SL4AM. I'm trying to make the script immune to problems resulting from spaces in filenames, but it's safer (and much wiser) to avoid them.

Project roots are top-level subfolders in SL4AM. There can also be a top-level subfolder in SL4AM called Archival that is intended to store completed or inactive projects.

Except for some code at the beginning to select a project, sl4am is concerned only with the world within a single project root, and in fact, it uses cd to go there as soon as possible so that most paths within a project are relative to the project's root. While running SuperLab, sl4am changes temporarily into the folder of the specified scenario (see below). At the top level of a project root, there must exist a file called rc.proj that is sourced when sl4am starts a session in that project. A subfolder X of SL4AMROOT is a project root iff X/rc.proj exists.

Each project contains a subject hierarchy. The top level is a dot-separated list of population.group folders, for example, Healthy.Elderly, Healthy.Young, Schiz.Elderly, Schiz.Young. Below each population.group folder is one or more condition.group folders, such as Set1.VisualFirst, Set1.AuditoryFirst, Set2.VisualFirst, Set2.AuditoryFirst. Below each condition-group folder is one or more numeric subject folders, for example 1, 2, 3. There is one subject folder for each subject to be tested in the project. Each subject folder must contain a flag file. It may be desirable for sl4am to create new subject folders automatically as needed; in any case, the names of these folders are simply numbers starting at 1. There also needs to be a way to insure that sl4am will test the first subject in each condition before starting the numerically next subjects, and so on.

Note that even though some labs never test any population group other than college students, both group levels are required. The best way to handle this situation is to use two population group identifiers, one named "Try" and the other something like "YN" (young normal). The Try group is for testing the experimental setup, while the YN group is a reasonable label for college students. If at some point you need to add another population group, it will be very easy to do it. The presence of the "YN" level won't interfere with anything, and the "Try" pseudogroup is very useful for development and for training RAs.

The flag file is a simple advisory access-control mechanism. When an experiment is first set up, each slot's flag file is named "flag.free", indicating that any computer can run that subject slot. The file is renamed to "flag.user@en0", where "user" is the current user and en0 identifies the current host. (The figure says "fern", which is the name of one of our computers; this method is too variable.) This "checks out" the subject slot to the named individual. The parameter "en0" is set to the ethernet address of en0 (ifconfig en0 | grep ether | read junque ether junque) in order to identify the machine in a somewhat unambiguous way. An individual may relinquish a subject slot by renaming the flag file back to "flag.free", but the presence of datafiles will cause a multi-experiment sequence to pick up where it left off before. When a subject slot is complete and all data has been saved, the flag file must be renamed to "flag.completed".

Note that if DATAROOT is defined, then sl4am will search there for flag.free slots, change to flag.user@en0 there, and will upload data there. The structure under DATAROOT is identical to the SL4AMHOME hierarchy, but it need not have any Shared folder, rc files, or scenarioes or stimuli. In the event that subjects must be tested offline, one or more subject slots should be checked out in advance. When sl4am starts up, it will sweep all experiments looking for the flag "SYNC" in a subject-number folder. If it finds any, and if DATAROOT is accessible, it will synchronize the parallel structure there by uploading any data not already present, and renaming the flag file if necessary. Any time sl4am tries to upload data or change the flag file but DATAROOT is inaccessible, SYNC is created. After a successful upload, it is removed.

There also must be an rc.subj file in a subject-number folder, primarily designed to control running multi-experiment projects (for example, the order in which to run the experiments). This can be a zero-length file.

For each experiment, there is an experiment root under the subject-number folder. This folder contains all that is needed to run one SuperLab experiment: the scenario (.sl4) file; all stimuli needed (these will generally be links elsewhere, or in the case of all-text experiments, missing altogether); and the datafile (.txt) once the experiment has been run. There is also a mandatory rc.exp file, primarily intended to help customizing the SuperLab run or the datafile processing afterwards. For example, this could give the subject some feedback about his performance. Sl4am cd's into the experiment root before running rc.exp and SuperLab.

Under each project folder, there can be an optional Shared subfolder. This contains all shared stimuli and/or sl4 files. For convenience, the setup script will install a Shared symbolic link in each experiment root that points back to the project-level Shared folder (if it exists). To link to it from a stimulus folder in an experiment root under a subject file, just use

    ln -s ../Shared/SomeFolder/somefile.xxx stimset/somename.xxx

To link a scenario file in the experiment root to one in the Shared folder, use

    ln -s ./Shared/some-exp.sl4 some-exp.sl4

It is possible to fully populate this tree before beginning to run, but it is equally possible to use the rc scripts to populate fill things out at run time.

UPDATE

It really isn't hard to link relatively back to the Shared folder without the klutzy locate Shared link. Here's how to do it. You create a dummy tree all the way down to the stim/stimset folder inside an experiment. Also, create a Shared folder with a stimset/xxx.png in it. Then cd down there and do "ls ..", "ls ../.." and so on until you get to (e.g.) "ls ../../../../../../Shared/stimset". The test it with ln -s ../../../../../../Shared/stimset/xxx.png yyy.png" or whatever. Once you've done this and gotten it to work, just make sure that your script does two things: use the right number of dotdots, and actually cd into the stimset folder to do the linkage:

(cd downpath; ln -s uppath/Shared/stimset/file.suff link.suff)

Note that assuming the script is running in the project root, downpath will be like "pgrp/cgrp/subj/exp/stim/stimset", and uppath will be like "../../../../../..". As for why this is worth doing instead of just using absolute links, it's to allow experiments to be installed using simple methods like tar. With relative links, no adjustment is required, with absolute links, basically there would need to be a script run to adjust links after installation in new locations.

No comments:

About Me

My photo
Ignavis semper feriƦ sunt.