Space University of Florida - The Foundation of the Gator Nation
University of Florida College of Liberal Arts and Sciences
Space
Quantum Theory Project QTP Home page
Slater Lab

Programming Introduction

The systems at the John C. Slater Lab support several styles of programming:

  • serial programs: on SUN desktops and IBM cluster nodes 
  • shared memory programs: on SUN E5000 server and POWER3 IBM SP nodes 
  • message passing programs: on a group of SUN desktops, IBM cluster nodes or IBM SP nodes. 
This brief tutorial introduces the commands and flags to get started. The on-line documentation should be consulted for more details. 

Examples are given for Fortran 95 programs. But all options are the same for C and C++ programs and the compiler names are found by replacing f90 with c or C. 

All aspects of parallel programming are covered in detail in the class offered by Erik Deumens. See http://www.qtp.ufl.edu/~deumens/adv-prog.html for details. 

Serial programs

The IBM Fortran compiler is called xlf. Aliases are xlf77 and xlf90. 
xlf90 -o serial serial.f
will compile the program serial.f and produce the executable called serial

The IBM C compiler is called xlc and the C++ compiler is called xlC. All options discussed below hold equally for all langauges. The SUN SPARC compilers are called f77, f90, f95, cc, and CC. For Solaris on Intel the Portland Group compilers phf77 and pgf90 are available. 

You can link with BLAS, LAPACK, FFTW libraries (SUN and IBM) and ESSL libraries (IBM only) by specifying the link flag -lblas, -lfftw, -llapack, -lessl

You can gain maximal optimization for each architecture with the following flags 

  • SPARC: -O4 -fast
  • POWER: -O5 -qarch=pwr -qtune=pwr -qhot
  • POWER2: -O5 -qarch=pwr2 -qtune=pwr2 -qhot
  • POWER2: -O5 -qarch=p2sc -qtune=p2sc -qhot
  • POWER3: -O5 -qarch=pwr3 -qtune=pwr3 -qhot
  • POWER4: -O5 -qarch=pwr4 -qtune=pwr4 -qhot

NOTE It is possible within the IBM POWER architecture to do cross compilations. The -qarch flag specifies instructions set the compiler is allowed to use. Some sub-architectures have extra instructions, as a result binaries using that instruction will crash with illegal instruction errors when run on sub-architectures that do not have that instruction. The hierarchy is as follows:

POWER pwr ----> POWER2 pwr2 ----> POWER2SC p2sc
  |
  +-----------> POWERPC ppc
  |
  +-----------> POWER3 pwr3 ----> POWER4 pwr4
The -qtune flag specifies which timings and cache sizes to use during optimizations. The value of that flag can make the binary go faster on some sub-architectures but it will never cause a crash. As a result is is OK to compile with
-O5 -qarch=pwr -qtune=pwr4
This binary will always run, and run fastest on POWER4 machines like the p690. But it will not run as fast as when you use
-O5 -qarch=pwr4 -qtune=pwr4
becuase this last set is allowed to use an extra instruction that is very fast on floating point operations (FMA, fused multuply add). NOTE If you compile on a machine with AIX 5, it will not run on a machine with AIX 4. However, if you compile on AIX 4, the binary with run on AIX 5.

The libraries are optimized for different architectures and should be used with these optimizations by specifying the names as follows 

  • SPARC or POWER: -lessl
  • POWER2: -lesslp2
  • POWER3: -lesslsmp
Any optimization level higher than -O3 is very agressive and may alter the numerical results, you should test these levels for accuracy  Back to the top.

OpenMP programs

Consider a simple program using OpenMP directives for achieving parallel execution on shared memory machines. (The source is reproduced at the end of this section). 

It can be compiled for the IBM with 
xlf90_r -qsmp=omp -o openmp openmp.f
Then setting the environment variable with the appropriate shell command (example in korn shell) 
export OMP_NUM_THREADS=4
or, for XLF prior to XLF 7.1, 
export XLSMPOPTS="parthds=4"
will run the application on 4 CPUs if the system has that many CPUs. 

To compile it for the SUN SPARC the command is 
f90 -omp -o openmp openmp.f
and you set 
export PARALLEL=4

Optimization and linking in libraries is the same as for serial programs. 

To link to the POSIX libraries from C or C++ use the link flag -lpthread and specify the "reentrant version of the compilers xlc_r, xlC_r on the IBM. 

The example also shows how to time parallel programs: use real time (wall clock time) instead of CPU time. 

program openmp
  integer, parameter :: n = 100, q = 100
  real*8, dimension(n,n) :: a,b,c
  integer :: i,j,k,p
  integer :: ic,ir
  real*8 :: time0 = 0.d0, time1 = 0.d0
  real*8 :: rtc, time10 = 0.d0, time11 = 0.d0
  do j=1,n
     do i=1,n
        a(i,j) = float(i)*float(j)
        b(i,j) = sqrt(float(i+j))
        c(i,j) = 0.d0
     end do
  end do
  call system_clock(count=ic,count_rate=ir)
  if (ir /= 0) then
     time0 = dfloat(ic)/dfloat(ir)
  else
     print *,'No time available.'
  end if
  time10 = rtc()
  !$OMP parallel do shared (a,b) private (i,j,k,p)
  do j=1,n
     do p=1,q
     do i=1,n
        do k=1,n
           c(i,j) = c(i,j) + a(i,k)*c(k,j)
        end do
     end do
     end do
  end do
  !$OMP end parallel do
  call system_clock(count=ic,count_rate=ir)
  time1 = dfloat(ic)/dfloat(ir)
  time11 = rtc()
  print *,' Time ',time1-time0
  print *,' Time ',time11-time10
  print *,' Speed ',dfloat(2*n*n*n*q)/
           (time11-time10)/dfloat(1000000),' Mflops'
  stop
end program openmp
    
Back to the top.

MPI programs

Consider a simpel program that uses the Message Passing Interface standard to achieve parallel execution on a system of compute nodes connected by a communication device. (The source is reproduced at the end of this section). 

It can be compiled on the IBM SP with 
mpxlf -o mpi mpi.f
It can then be executed by using the poe command ro by running the program directly. Options for controlling the execution are passed as flags or as environment variables. 

The most important thing to do is create a file with hostnames. The default is host.list. You can specify anothe rname in the variable MP_HOSTLIST or with the flag -hostlist. 

The number of tasks is set in MP_PROCS or with the flag -procs. 

The use of the fast switch on the SP can happen in two modes, called IP mode and user mode. User mode is best for optimal performance. The mode can be specified in MP_EUILIB=us or ip or with -euilib us or ip. 

The execution is then started with 
poe ./mpi -procs 4 -euilib us
or with 
./mpi -procs 4 -euilib us
The result is the same. 

For C and C++ programs, change mpxlf into mpxlc or mpxlC. 

To compile mixed mode programs using both MPI and OpenMP or POSIX Threads, use mpxlf_r, mpxlc_r and mpxlC_r. 

On the SUN system the public domain MPI implementation MPICH is installed. The compilation is done with 
mpif77 -o mpi mpi.f and the execution is started with 
mpirun mpi -machinefile host.list -np 4

program mpi
  integer ntasks, myid, ierr
  include 'mpif.h'
  call MPI_INIT(ierr)
  call MPI_COMM_SIZE(MPI_COMM_WORLD, ntasks, ierr)
  call MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr)
  print *,' nr tasks =',nprocs,' my id =', myid
  call MPI_FINALIZE(ierr)
  stop
end program mpi
    

>> top

Space Space Space
Space
Have a Question? Contact us.
Last Updated 12/15/07
 
University of Florida