Space University of Florida - The Foundation of the Gator Nation
University of Florida College of Liberal Arts and Sciences
Space
Quantum Theory Project QTP Home page
Slater Lab

Disk Space User Guide

Keep directories.

This category is for storing the results and data files needed for a project and as a holding place for moving data to and from other long term storage media. Keep directories are only allowed on some scratch disks. Every user can create a keep directory in the directories:
/scr/crunch_2/tmp/... (380 GB)
/scr/crunch_3/tmp/... (380 GB)
/scr/scxratch_4/tmp/... (1.5 TB)
/scr/scxratch_5/tmp/... (1.5 TB)
/scr/scxratch_6/tmp/... (1.5 TB)
However, to keep data from stagnating on these disks, an automated system process runs every morning to clean these disks. All files and directories in this category will be deleted, except directories with a name of the form:
keep.*.<date>
where <date> is a date in the format dd-mm-yy, 12-02-03 for February 12, 2003, that is not in the past and not more than 14 days = 14 * 24 hours in the future. These disks are also scratch disks, so scratch directories are allowed to persist also as described in the next section.

Scratch directories.

The scratch disks are usually the largest partitions on each machine and are used for immediate scratch space by running programs. Before the programs exit, they should delete all files from these disks. The automated system process will delete them if they are left behind. Every user can create any file or directory in the directories:
/scr_1/tmp/... (local name on each machine)
/scr/machine_disk/tmp/... (global name from any machine)
All files and directories in this category will be deleted, except directories with a name of the form:
<hostname>.*.<PID>
where <PID> is the Process ID number of a process running on the local host or on the QTP host with name <hostname>.

There is another set of scratch disks that are never cleaned up by any automated process. They are managed by the users and groups that are responsible for them. These are: /scr/arwen_2/... (1 TB)
/scr/arwen_3/... (1 TB)
/scr/haku_2/... (1 TB)
/scr/wukong_2/... (10 TB)
These directories are all RAID 5 so that a single disk failure will not ruin the data and there is time to replace the failing hardware. These disks are backed up only once every few months for disaster recovery purposes.
The above directories contain a tmp for general short-term use, but they also contain user work areas with the name of the user.

Guide lines

The main tasks to be considered in this man page are:

1. Running a job which requires a lot of CPU, RAM and scratch disk. You must select the correct server and queue combination, and follow the rules explained in the man page on qtpqueues(l).

2. A project consisting of running many jobs. This type of project requires all aspects of 1. for each job, but in addition requires some data management. If the important data extracted from each job is small in size, it can be kept on your home disk, together with the text of the thesis or paper. If the data file from each job to be processed further is large, you must put it on short-term scratch, if all the work can be done in 14 days, or move it to long-term scratch. Careful planning in the beginning saves enormous amounts of time later.

3. Software building project. This usually takes a short time. Unpack a tar file, run make, debug and install. If debugging takes a long time, you really are in the next case.

4. Software engineering project. Most people at QTP do some software development. The model for such development that has been used succesfully at QTP for years by some of the more advanced subgroups is based on techniques developed by computer scientists, practiced in the industry and perfected and adapted for Computational Chemistry and Material Science by people at QTP and other places.

The model is that there is a well thought-out structure defined for each subgroup in QTP. All software lives in an account, like ~sbtprogs, for example, not associated with any individual gradaute student or post-doc or faculty, since these people move and change roles, but the group's software needs to be functional at all times. One or more individuals have primary responsibility for this directory tree.

The sharing of files by a group, with certain subgroups having write access and others having read-only access, can be arranged satisfactorily by creating special UNIX groups, in addition to the one containing all members of the research subgroup associated with each QTP faculty.

The group's software account contains the master source code, under CVS control. CVS stands for Concurrent Version System. It is software that keeps track of changes made to software. Especially for large pieces of software like ZINDO, ACES II and ENDyne, such automated source code control is essential. The CVS software consists of one command, cvs(1), with numerous options. It is very simple to use. It automatically keeps track of who makes what changes to what piece of software for which reason.

The group directory also has a checked out version of the source of all its software under deveolpment. That version is the one from which the active production executables are built.

There is also a lib and bin subdirectory with symbolic links pointing to camp directories containing the SPARC and POWER architecture specific libraries and executables. These executables and libraries are large files and should not be stored on a /home disk, but on a /camp disk. The camp-directories are owned by the group account. Symbolic links can be used to make the whole tree look simple (as if it were all on a single disk).

Each subgroup member who must do some development, should check out, with cvs(1), the piece of interest into a personal home-directory or camp-directory if the portion is too large. The member then works on the piece and uses a makefile to compile the source code of the piece and link it to the group library to load subroutines that are not modified. The test executable should be stored in a personal camp-directory, not in the group directory, so as not to jeopardize production by other group members while the new piece is being debugged.

Only after careful testing against an established set of test cases stored in the group directory and only when all test cases give the correct results, can the member update the group source code repository with CVS. The members changes should be checked by the person responsible for the group code before this checkin operation, which is very simple with cvs(1).

Below we specify in detail how each kind of disk should be used. If these practices and procedures are followed, with variations, of course, whole QTP system and all its 150 GByte of disks will be used smoothly, effectively and productively. This cooperation will benefit the entire QTP community and everyone in it.

Think about what you want to do and plan your way of doing it before you start.
One minute of planning, saves many hours,- of your time, not just computer time,- during execution.

Performance considerations

It is important to think about what your program will do with data files when deciding where to put the files.

Files that are read only once at startup can be place anywhere it is convenient. Large files that must be read, should be placed on /home/crunch_1 or on dedicated directories assigned by the computer director on /scr/crunch_2 or 3 or 4.

Files that are being written intensively, such as integral files in electronic structure caclulations, should be written to local scratch disks /scr_1.

Molecular dynamics and Monte Carlo codes do write large files, but not with the intensity that integral files or files with Coupled Cluster intermediates, Such files can be written to remote disks such as /scr/crunch_2 or /scr/crunch_3.

Never write to local workstation disks such as /scr/sred18_1 directly, but write to local scratch first and copy the result at the end of your job with

cp /scr_1/tmp/xena23.mike.123345/results /scr/sred18_1/tmp/mike/results3

# @ Requirements = (Arch == "power3") && (OpSys == "AIX52") && (Feature == "localscr")

For jobs that do little or no IO to disk, you should add the lines
# @ Requirements = (Arch == "power2") && (OpSys == "AIX43") 
# @ Preferences = (Feature == "remotescr")
Consult the
LoadLeveler Definition and Use documentation for more details on these lines.

>> top

Space Space Space
Space
Have a Question? Contact us.
Last Updated 11/27/09
 
University of Florida