QTP Home page | ||||||||||||||||
Moab Batch System Guide
User introduction
The commands for managing jobs only work on the nodes linx64
and wukong and ock.
Queue definitionsThe queue definitions can be checked with qstat -q and qstat -Q -f.
#PBS -l pmem=600mbBecause arwen nodes have only 1,500MB per node, the default of 900 MB for two CPUs does not fit and such jobs will linger in the queue forever. See further below for example jobs. Property definitionsTorque allows each node to be assigned properties. These can be specified in the job scripts so that you can direct Moab to allocate the correct node(s) to your job. You can check the properties with pbsnodes -a.
Use of switches for parallel jobsMoab will try to schedule parallel jobs so that all nodes allocated to a job will come from a group of nodes on the same switch to reduce communication delays. This is automatic. The groups are internally called bc1, bc2, hs0, hs1, rs0, rs1, rs2, rs3, ws0. They denote the groups of 14 nodes in Arwen, of 16 nodes in Haku, of 19 nodes in RA, and of 16 nodes in wukong, that share the same switch. Also note that by default Torque scripts start in your $HOME directory, not the directory where you are when you submit the job. The error and output files are created by default in the directory where you submit the job. Moab scheduling principlesThe Moab scheduler is programmed using the following principles. You can plan and organize your work and submit the jobs to get the best turnaround time for your jobs.
Serial job exampleA simple serial job (download the job script file) #!/bin/sh #PBS -N serial #PBS -o serial.out #PBS -e serial.err #PBS -l nodes=1:ppn=1:wukong #PBS -l walltime=12:00:00 #PBS -l pmem=7500mb echo Testing... hostname echo run a serial program with 7.5 GB of RAM... echo doneThis job declares the name "-N serial" which will show up in the qstat, if not specified the filename will be the jobname. The standard output and standard error files are specified with "-o" and "-e". The flag "-m abe" specifies the a mail message should be sent to the address specified with "-M" when the job "b"egins, "e"nds, and "a"borts. The queue in which to run the job must be specified and it must be consistent with the requested walltime "-l walltime=12:00:00". Note that Moab schedules jobs that request shorter walltimes sooner and delays jobs that ask for a very long walltime. The "-l nodes=1:ppn=1:wukong" specifies that this job asks for 1 CPU resource, in the form 1 node and 1 processor per node. It also specifies the property "wukong", so that the job will run only on wukong nodes. You can also specify "-l ncpus=1". To specify the type of node, you then also need "-l nodes=wukong". Since the configuraqtion specifies that each node has 2, 4 or 8 CPUs, two jobs can be run on each physical node, if they each ask for one CPU resource. You can use qstat -f jubname to see what node/CPU combination has been allocated to your job(s). This is also shown graphically on the cluster status pages on this web site. It is also a good thing to specify the memory required by your program with "-l pmem=900mb". The default is 900mb for QTP and 600 mb for HPC. For parallel jobs, this is the memory neede per processor. Parallel jobs examplesThe following is a parallel job to run an OpenMP or POSIX Threads shared memory, or OpenMPI distributed memory parallel program on a single wukong node. QTP clusters have 2, 4 or 8 cores per node so 8 is the maximum you can ask for this kind of parallel job. (download the job script file) #!/bin/sh #PBS -N para #PBS -o para.out #PBS -e para.err #PBS -q quick #PBS -l nodes=1:ppn=8:wukong echo Testing single node parallel... hostname # some preparations here # run your shared memory parallel program here # for example g03 with 8 processors g03 benzee.inp > benzene.log # or run 8-way parallel MPI job with mpirun mpirun vasp chickenwire.inp > chickenwire.log # some cleanup here echo done.Shared memory parallel programs have multiple cores working on the same data, whereas distributed memory paralle programs have a section of RAM dedicated to each core and the cores send messages to each other to cummincate. If all parts of the distributed memory parallel program are running on a single multi-core node, then the communication is just a memory-to-memory copy and this is usually faster than commincationw with cores in othe rnodes. Especially on the QTP clusters where nodes can only communicate via Gigabit Ethernet, this is the case. The HPC Center cluster has nodes tha can communicate over InfiniBand and this is much faster than Gigabit Ethernet. See Programming Intro for more details. The following is a parallel job to run a OpenMPI program on multiple nodes, with 2 CPUs (or cores) per node. (download the job script file) #!/bin/sh #PBS -N para #PBS -o para.out #PBS -e para.err #PBS -q quick #PBS -l nodes=2:ppn=2:arwen echo Testing parallel... hostname # simple way.... mpirun hello > para.log 2>&1 # hard way... echo PBS_NODEFILE cat $PBS_NODEFILE N=`wc -l $PBS_NODEFILE | awk '{print $1}'` echo Nr nodes $N echo starting hello... mpirun -np $N hello > para.log 2>&1 echo done.All flags are the same as for the serial job, except the "-l". It specifies that you request a list of hosts of certain types. The format is nodes=n1:ppn=M:type1 to ask for n1 nodes with n2 processors per node of type1. Currently two types are defined on arwen: bc1 and bc2 for BladeCenter 1 and 2 in the cluster arwen. These are nodes arwen0* and arwen1* respectively. It is advantageous for MPI jobs to run inside the same BladeCenter to reduce communication delays. The example job requests 2 nodes of type arwen and specifies ppn=2, i.e. both processors in the node. The scheduler will try to find two nodes in one of the two blocks of 14 nodes in arwen bc1 or bc2 that share a switch so that communication between the nodes is as fast as possible. As a result the PBS_NODEFILE will contain 4 entries (each name appearing eight times) for the job to use. The environment variable PBS_NODES is set by PBS and is the name of a file with the hosts that are assigned to you by PBS. You can download the small program hello.cpp to try running this job.
LAM_MPI_SESSION_PREFIX=/scr_1/tmp/hostname.myjob.PID export LAM_MPI_SESSION_PREFIX Then lamboot will create the directory in a place that is safe as long as your job runs. Administrator introduction
|
||||||||||||||||
Have
a Question? Contact us. Last Updated 11/20/08 |