Space University of Florida - The Foundation of the Gator Nation
University of Florida College of Liberal Arts and Sciences
Space
Quantum Theory Project QTP Home page
Slater Lab

PBSPro 7.0 Batch System Guide

User introduction

The commands for managing jobs only work on the node arwen.
The command qstat shows the running jobs.
qstat -q shows the available queues and their time and memory and nr of CPU limits.
qsub my.job is used to submit jobs. You can either supply flags to the command or include options in your script, which is the recommended way.
pbsnodes -a is used to check the status and properties of the nodes in the clusters.

NOTE:
If you create the job script myscript,job on a Windows machine, you MUST run
dos2unix myscript.job
to remove the Carriage Returns at the end of each line; if you fail to do this, the script will terminate with strange errors such as "unexpected end-of-file on standard input".

Queue definitions

The queue definitions can be checked with qstat -q and qstat -Q -f
  1. quick Time limit 2 hours. Used for debugging and testing and short production jobs.
  2. clever Time limit between 2 and 96 hours and up to 12 node parallel (24 CPUs). This is how clever people use the system.
  3. brute Time limit longer than 96 hours and up to 6 node parallel (12 CPUs). This is for brute force and not so clever use of the system.

Property definitions

OpnPBS allows each node to be assigned properties. These can be specified in the job scripts so that you can direct PBS to allocated the correct node(s) to your job. You can check the properties with pbsnodes -a.
  1. arwen, haku, ra denotes the group of nodes belonging to the cluster with that name.
  2. i686, x8664, amd denotes the CPU architecture. The nodes in Arwen have IA32 architecture and the arch command returns i686. The nodes in Haku have EM64T architecture and the arch command returns x86_64; since OpnePBS does not allow underscore in properties, we use x8664. The nodes in RA are AMD Opteron 64 bit processors and the arch command returns x86_64; we defined the additional property amd for the Ra nodes to distinguish them from Haku nodes.

Use of switches for parallel jobs

PBS will schedule parallel jobs so that all nodes allocated with come from a group of nodes on the same switch to reduce communication delays. This is automatic. The groups are internally called bc1, bc2, hs0, hs1, rs0, rs1, rs2, rs3. They denote the groups of 14 nodes in Arwen, of 16 nodes in Haku, of 19 nodes in RA that share the same switch.

Also note that by default PBS scripts start in your $HOME directory, not the directory where you are when you submit the job. The error and output files do live in the directory where you submit the job by default.

A simple serial job (download the job script file)

#!/bin/sh
#PBS -N serial
#PBS -o serial.out
#PBS -e serial.err
#PBS -m abe
#PBS -M deumens@qtp.ufl.edu
#PBS -q quick
#PBS -l nodes=1
echo Testing...
hostname
echo run a serial program...
echo done
This job declares the name "-N erik" which will show up in the qstat, if not specified the filename will be the jobname.
The standard output and standard error files are specified with "-o" and "-e".
The flag "-m abe" specifies the a mail message should be sent to the address specified with "-M" when the job "b"egins, "e"nds, and "a"borts.
The queue in which to run the job is specified with "-q quick".
The "-l nodes=1" specifies that this job asks for 1 CPU resource, which unfortunately is called "nodes". Since the configuraqtion specifies that each node has 2 CPUs, two jobs can be run on each physical node, if they each ask for one CPU resource. You can use qstat -f jubname to see what node/CPU combination has been allocated to your job(s).
NOTE:
If you do not specify the resource requirement, PBS will pile your jobs all in the same node/CPU slot.

The following is a parallel job to run a LAM MPI program (download the job script file)

#!/bin/sh
#PBS -N para
#PBS -o para.out
#PBS -e para.err
#PBS -m abe
#PBS -M deumens@qtp.ufl.edu
#PBS -q quick
#PBS -l nodes=4:bc2:ppn=2
cleanup() {
  lamhalt -v
  echo job killed
  exit
}
echo Testing parallel...
hostname
echo PBS_NODEFILE
cat $PBS_NODEFILE
N=`wc -l $PBS_NODEFILE | awk '{print $1}'`
echo Nr nodes $N
lamboot -v $PBS_NODEFILE
# Catch TERM and KILL signals to shut down MPI
trap cleanup TERM KILL
echo starting hello...
mpirun -np $N hello > para.log 2>&1
lamhalt -v 
echo done.
All flags are the same as for the serial job, except the "-l". It specifies that you request a list of hosts of certain types. The format is n1:type+n1:type2+... to ask for n1 nodes of type1 and n2 nodes of type2, etc. Currently two types are defined on arwen: bc1 and bc2 for BladeCenter 1 and 2. These are nodes arwen0* and arwen1* respectively. It is advantageous for MPI jobs to run inside the same BladeCenter to reduce communication delays. The example jobs requests 4 nodes of type bc2, i.e. in BladeCenter 2 and specifies a further type of ppn=2, i.e. processor per node equal to 2; as a result the PBS_NODEFILE will contain 8 entries (each name appearing twice) for the job to use.
The environment variable PBS_NODES is set by PBS and is the name of a file with the hosts that are assigned to you by PBS.
You can download the small program hello.cpp to try runningthis job.
The LAM MPI commands create a temporary directory of the form lam-user@host-pbs-jobnr.arwen in the directory TMPDIR=/scr_1/tmp. This directpry will be deleted by the scratch cleaning daemon. To avoid this make a directory with the correct name and then add to your script
LAM_MPI_SESSION_PREFIX=/scr_1/tmp/hostname.myjob.PID export LAM_MPI_SESSION_PREFIX
Then lamboot will create the directory in a place that is safe as long as your job runs.

Administrator introduction

  1. server daemon The server daemon runs on arwen, arwengw is the interface that points to the nodes of teh cluster.
    All nodes are listed in a nodes file with a property of bc1 or bc2, indicating which BladeCenetr the node is part of and two CPUs, as follows:
    arwen00 bc1 arwen i686 np=2
    arwen01 bc1 arwen i686 np=2
    ...
    arwen10 bc2 arwen i686 np=2
    ...
    haku00 hs0 haku x8664 np=2
    ...
    haku10 hs1 haku x8664 np=2
    ...
    ra00 rs0 ra x8664 np=2
    ...
    ra10 rs1 ra x8664 np=2
    ...
    ra20 rs2 ra x8664 np=2
    ...
    ra30 rs3 ra x8664 np=2
    ...
    
    Jobs will be allocated to node/CPU slots. Using the nodes as timesharing nodes does not work very well for parallel jobs that need to run across different nodes; but this may be my misunderstanding.

    The properties defined for the nodes allow you to target specific groups of nodes easily.

  2. execution daemon Each node runs a pbs_mom daemon with configuration file
    $logevent       511
    $clienthost     arwengw
    $restricted     arwengw
    $ideal_load     2.0
    $max_load       2.1
    $usecp  *:/     /
    
    There is no execution daemon on the management node arwen or on the file server arwensrv.
  3. scheduler The scheduler daemon also runs on arwen. The scheduler used is the simple FIFO with load balancing turned on:
    load_balancing: true ALL

>> top

Space Space Space
Space
Have a Question? Contact us.
Last Updated 12/15/07
 
University of Florida