Space University of Florida - The Foundation of the Gator Nation
University of Florida College of Liberal Arts and Sciences
Space
Quantum Theory Project QTP Home page
Slater Lab

Slater Lab clusters User Guide

More details on each system: Linux clusters

Recent changes in QTP clusters

The summer of 2008, we have upgraded the software, operating system and compilers, on the QTP clusters arwen, haku, ra, surg, ock, wukong The new version of software and its organization will be the same as used in the UF HPC Center to make it easier to mix the use of both systems in the same project.

QTP linux clusters are now using Torque and Moab as the resource manager and scheduler, respectively. This makes the behavior consistent with the UF HPC Center and allows us to use the HPC Center license, which includes the capability to do UF wide grid computing and grid scheduling.

To manage jobs on Linux clusters arwen, haku, ra, surg, ock, wukong: Use ssh to connect to linx64, and use commands qstat, qsub, qdel, etc. to manage jobs. The Moab scheduler and Torque resource manager run on wukong, and thus all jobs will have ID's of the form ######.wukong.

The interactive node linx32 still needs to be upgraded as of May 6, 2009. This may happen or not as 32 bit machines are old and too slow for most users. At this time it cannot be used to submit any jobs or check their status.

The details of the queues, such as names and default and maximum limits for walltime and RAM per CPU can be found on the Moab/Torque guide page.

The old QTP clusters clusters simu and atanasoff have been dismantled during the summer. Some of the simu nodes are still running but they are no longer supported. Use them as they may fit your projects.

>> top

Closer connection to HPC

The new software on the nodes and on linx64 is the same as the software on the HPC center cluster. Thus the lates Intel compilers are available on the new nodes.

In addition the Lustre parallel files systems /scratch/ufhpc (30 TB) and /scratch/crn (80 TB) are mounted on linx64 and on the QTP cluster nodes except arwen nodes. Thus enabling easier access to the same files when working on a large project using QTP and HPC Center resources. Howver, this connection is over Gigabit Ethernet and not over InfiniBand. Therefore the performance is good very general file manipulation. For some applications the performance is even good enough to store high-intensity scratch files, like integral files for Gaussian.

Details about the HPC center cluster can be found at the HPC Center web site.

>> top

The clusters amun, arwen, haku, ra, surg, ock, wukong

Arwen was installed in Spring 2004, amun, haku, and ra in Summer 2005, surg Winter 2006, ock in Winter 2007, and wukong in Summer 2008. amun was dismantled to make room for wukong.

Characteristics
The clusters have a total of 436 cores: 2.8 GHz IA32 Xeon, 3.2 GHz EM64T Xeon, 2.2 GHz AMD64 Opteron. surg has two dual-core AMD Opteron CPUs. wukong has two quad-core Intel E5420 2.5 GHz CPUs.
Several TB-sized file system are mounted on /scr_2 inside the clusters and on /scr/arwen_2 /scr/arwen_3 /scr/haku_2 /scr/wukong_2 for other QTP hosts. The ock 3 TB files system is mounted on both /scr_1 and /scr_2 inside ock and on /scr/ock_2 for other QTP hosts.

The Linux clusters are suited for calculations that require:

  • Fast CPUs
  • Large RAM for each CPU
  • Fine grained parallelism with up to 2 CPUs, use OpenMP
  • I/O to local disk, use /scr_1/tmp

The QTP Linux clusters do not have a fast interconnect. For parallel jobs that require many processors to communicate with each other the HPC Center clusters are a better choice.

Commands
Jobs are managed with Torque and Moab. Use commands qstat, qsub, qdel, etc. from the interactive node linx64, and later in Dec 2008 from linx32. They can also be managed from ock and from wukong.

Logging in
Use ssh to log in to linx32, linx64. You can log in to any node, but avoid it as much as possible. Sometime the ssh connect takes a longtime to establish. This may be caused by the protocol to set up a tunnel for X11 display. If you do not need X11 capability from the node you log into, use ssh -x to connect. This will go very fast.

Scheduling principles
It may be helpful to consider the scheduling principles and priorities used by the Moab scheduler when planning your jobs and workflow. The queue definitions and limits are found on that page as well.

>> top

Cluster ownership and usage

The primary use of each QTP cluster is as follows:

  • arwen is for use by the Roitberg group
  • haku is for use by the Hirata group
  • ra and wukong are for use by the Merz group
  • surg is for use by all members of QTP
  • ock is for use by the Bartlett group

A member of QTP can make individual arrangements with the principal investogator of each research group to make use of the cluster for special projects.

The HPC Center cluster can be used by any researcher on campus. Prof. Cheng (phase II and Phase III) and Prof. Merz (Phase III) have invested in the HPC Center at the faculty level which gives the jobs submitted by members of their research groups higher priority; QTP has invested at the department/institue level in Phase III, which gives jobs from all members of QTP higher priority. The College of Liberal Arts and Sciences has invested at the college level and this gives an advantage for all researchers in CLAS.

>> top

Space Space Space
Space
Have a Question? Contact us.
Last Updated 5/6/09
 
University of Florida