Space University of Florida - The Foundation of the Gator Nation
University of Florida College of Liberal Arts and Sciences
Space
Quantum Theory Project QTP Home page
Slater Lab

Linux Cluster Project

Kongming

In November 2009, Prof. Kennie Merz brought a 16 twin-node cluster from Silicon Mecahnics with two hexcore AMD 2382 2.6 GHz CPus and 32 GB of RAM and 1 TB local disk. The twin-nodes sit side-by side in a 1U slot. Thus Kongming has 64 nodes, each with two CPUs with 6 cores per CPU. This makes a total of 384 cores. The nodes are connected by Gigabit Ethernet.

The nodes are identified by their rack and slot number, such as r14a-s20: it is the left node in slot 20 in rack 14..

The cluster is called kongming. It is physically located in Larsen 121 and is managed by the UF HPC Center. Therefore jobs must be submitted through the submit1.hpc.ufl.edu or submit1.hpc.ufl.edu nodes, not from linx64. The the storage node wukong provides access to the file system /scratch/wukong.

Linx64

To provide nodes for Linux development for everyone in QTP, two servers were acquired in the Winter of 2005. The first is linx64. It is an IBM xServer 336 with two 3.2 GHz EM43T Xeon CPUs and 8 GB RAM and two 140 GB disks. The second is linx32. It is an IBM xServer 335 with two 3.2 IA32 Xeon CPU and 4 GB RAM and two 130 GB disks. It is no longer available for general use, as 32-bit hardware is not used very much anymore. It is now used as license server and performs backups of the scratch file systems.

Wukong

In April 2008, Prof. Kennie Merz brought a 16 node cluster from Penguin Computing with two quadcore Intel E5420 2.5 GHz CPus and 16 GB of RAM and 80 GB local disk to replace the aging amun cluster. Also included was a new storage server with 8 TB of RAID storage and a flash disk for teh operating system to replace the aging ra storage server with 3 TB of storage. the cluster became operational on June 29, 2008.

The nodes are called wukong0 through wukongf and it uses the same storage node as Ra.

The nodes in Wukong are managed by Moab from the storage node wukong and jobs must be submitted from linx64

The cluster is named after the mythical Chinese monkey king: Sun Wukong possesses incredible strength, being able to lift his 13,500 j?n (8,100 kg) Ruyi Jingu Bang with ease. He also has superb speed, traveling 108,000 li (54,000 kilometers) in one somersault. Sun knows 72 transformations, which allows him to transform into various animals and objects (however, he is shown with slight problems transforming into other people, since he is unable to complete the transformation of his tail). He is a skilled fighter, capable of holding his own against the best generals in heaven. His hairs also contain magical properties, each capable of transforming into a clone of the Monkey King himself, as well as various weapons, animals, and other objects. He also knows various spells in order to command wind, part water, conjure protective circles against demons, freeze humans, demons, and gods alike, to name a few. Unlike most gods, he earned his immortality through battling heaven and earth.

Ock

In January 2007, Prof. Rod Bartlett purchased an 8 CPU SGI ALtix 450 with 256 GB RAM and 3 TB of Fibre Channel RAID storage. The system consists of two CPU bricks with four Itanium 2 processors each, one I/O brick and 9 memory bricks. All bricks are connected by the NUMALink fast network. This system is designed to run very large coupled-cluster calculations with the serial version of ACES, using one CPU and all RAM and all disk space.

At this time the system is not managed by PBSPro or any other batch system.

Surg

In Jan 2006, Prof. Adrian Roitberg and Prof. Jose Fortes, ACIS Lab, received an IBM SUR grant. This added seven nodes with two dual core AMD opteron 270 CPUs and 8 GB RAM to the QTP clusters. All communication goes over Gigabit Ethernet switches. The nodes use the scratch disk on buddy as lareg scratch disk /scr_2.

The cluster is managed by PBSPro running on the management node arwen and jobs must be submitted from linx32

Haku

In February 2005, Prof. So Hirata purchased a 32 node IBM xSeries cluster. Each node has two 3.2 EM64T Xeon CPUs and 4 GB RAM and an 80 GB disk. The cluster also has a storage node with two 3.6 GHz EM64T Xeon CPUs and 1 TB of Fiber Channel disk. The nodes communicate with Gigabit Ethernet.

The nodes are called haku00 through haku1f with 00, 01, 03, 04, 05, 06, 07, 08, 09, 0a, 0b, 0c, 0d, 0e, 0f for the first group of 16 nodes sharing one 24 port GigE switch and 10 through 1f for the second group.

The storage node is called hakusrv.

The nodes in Haku are managed by the same management node as Arwen, namely arwen.

The system was delivered on June 2, 2005 and was already assembled. On June 7 a new raised floor was installed in NPB 1114, and the new system was moved in June 8. The operating system CentOS 3.5, equivalent to Red Hat Enterprise Linux 3 update 5, was installed on July 6 and the QTP customization was performed and the OpenPBS queue manager started on July 9.

The cluster is managed by PBSPro running on the management node arwen and jobs must be submitted from linx32

Ra

In June 2005, Prof. Kennie Merz purchased a 76 node SUN z20 cluster. Each node has two 2.5 GHz AMD Opteron CPUs and 4 GB RAM and 70 GB disk. All communication is over Gigabit Ethernet switches.

The nodes are called ra00 through ra1i with 00, 01, 03, 04, 05, 06, 07, 08, 09, 0a, 0b, 0c, 0d, 0e, 0f 0g, 0h, 0i for the first group of 19 nodes sharing one 24 port GigE switch and 30 through 3i for the fourth group.

The storage node is called ra.

The nodes in Ra are managed by the same management node as Arwen, and Haku, namely arwen.

The system was delivered on June 13, 2005 in the form of two empty racks and 76 separately packed nodes. The nodes were assembled into the rack by Ken Ayers and Brian Op't Holt. The operating system CentOS 3.5, equivalent to Red Hat Enterprise Linux 3 update 5, was installed on July 6 and the QTP customization was performed and the OpenPBS queue manager started on July 9.

The cluster is managed by PBSPro running on the management node arwen and jobs must be submitted from linx32

Arwen

In December of 2003, Prof. Adrian Roitberg received funds from the University to build a Linux cluster. The John C. Slater Lab put in a share to build a system that has the extra hardware needed to manage and maintain the system as it grows to more nodes.

The system chosen was based on the IBM BladeCenetr technology which houses 14 dual Xeon 2.8GHz CPUs in a 7U rack enclosure. The initial cluster will have 28 compute nodes, a management node and a storage node with 1 TB of disk space configured in RAID-5. All nodes communicate via dual Gigabit Ethernet adapters channeled together.

The system was ordered on December 20, 2003. It arrived in February and was assembled in March after the Sanibel conference. During the first week of April the management node was installed with the Red Hat 9 version of Linux and the "Cluster Management System" software, which allows the system administrator to create a configuration file for all nodes and then installed all 28 of them in about 5 minutes. This fast installation was made possible by the Gigabit Ethernet switch connecting all nodes.

The Intel C, C++ and Fortran compilers were installed in addoition to the gcc and g++ compilers that come with Linux. LAM MPI was installed for creating parallel programs. Many applications for molecular dynamics such as AMBER and Siesta and Gromacs were installed to support Roitberg's research.

The Arwen cluster is an IBM eCluster 1350 consists of two 7U BladeCenter units, each with 14 nodes.

  • Each node has
    1. two 2.8 GHz Xeon processors
    2. 1.5 GB RAM
    3. 40 GB local IDE disk
    4. GigE network adapter
  • Each BladeCenter provides powercontrol and diagnostics and access to mouse, keyboard, monitor, and CDROM for all nodes. It also provides a GigE switch to connect all nodes with each other and to the outside world over 4 channeled GigE ports. The nodes are called arwen00 through arwen1d with 00, 01, 03, 04, 05, 06, 07, 08, 09, 0a, 0b, 0c, 0d for the 14 blades in bladecenter 1, and 1d through 1d for the second set. The nodes inside one bladecenter share a switch and therefore comminucate a bit faster with each other.
  • An eServer xSeries 345 arwensrv with
    1. two 2.8 GHz Xeon processors
    2. 2.5 GB of RAM
    3. two 36.4 GB UltraSCSI disks
    4. a 1 TB RAID-5 disk array with fourteen 73.4 GB 10k rpm UltrSCSI disks
    5. two channeled GigE ports to the cluster switch.
    provides storage to all nodes over NFS.
  • The cluster switch is a CISCO 3750 GigE switch with 24 ports.
  • A second eServer xSeries 345 arwen with
    1. two 2.8 GHz Xeon processors
    2. two 36.4 GB UltraSCSI disks
    3. 2.5 GB of RAM
    is the management server for the cluster. It has a service controller that provides access via Ethernet to the service ports of the BladeCenters and the power and diagnostics ports of the storage node. It also provides access to the KVM switch so that all nodes can be completely managed from the management node. It also serves as the network gateway between the QTP internal backbone network and the internal cluster network.

The cluster is managed by PBSPro running on the management node arwen and jobs must be submitted from linx32

Amun

In August 2005, Prof. Kennie Merz brought a 16 node AMD cluster from Penn State. Each node has two 2.0 GHz AMD Opteron CPUs and 2 GB RAM and 40 GB disk. All communication is over Gigabit Ethernet switches.

The nodes are called amun0 through amunf and it uses the same storage node as Ra.

The cluster was turned off and disassembled in April 2008.

>> top

Space Space Space
Space
Have a Question? Contact us.
Last Updated 6/29/08
 
University of Florida