QTP Home page | ||||
Programming IntroductionThe systems at the John C. Slater Lab support several styles of programming:
Examples are given for Fortran 95 programs. But all options are the same for C and C++ programs and the compiler names are found by replacing f90 with c or C. All aspects of parallel programming are covered in detail in the class offered by Erik Deumens. See http://www.qtp.ufl.edu/~deumens/adv-prog.html for details.Serial programs
The IBM Fortran compiler is called xlf. Aliases are xlf77 and xlf90.
The IBM C compiler is called xlc and the C++ compiler is called xlC. All options discussed below hold equally for all langauges. The SUN SPARC compilers are called f77, f90, f95, cc, and CC. For Solaris on Intel the Portland Group compilers phf77 and pgf90 are available. You can link with BLAS, LAPACK, FFTW libraries (SUN and IBM) and ESSL libraries (IBM only) by specifying the link flag -lblas, -lfftw, -llapack, -lessl. You can gain maximal optimization for each architecture with the following flags
NOTE It is possible within the IBM POWER architecture to do cross compilations. The -qarch flag specifies instructions set the compiler is allowed to use. Some sub-architectures have extra instructions, as a result binaries using that instruction will crash with illegal instruction errors when run on sub-architectures that do not have that instruction. The hierarchy is as follows: POWER pwr ----> POWER2 pwr2 ----> POWER2SC p2sc | +-----------> POWERPC ppc | +-----------> POWER3 pwr3 ----> POWER4 pwr4The -qtune flag specifies which timings and cache sizes to use during optimizations. The value of that flag can make the binary go faster on some sub-architectures but it will never cause a crash. As a result is is OK to compile with -O5 -qarch=pwr -qtune=pwr4 This binary will always run, and run fastest on POWER4 machines like the p690. But it will not run as fast as when you use -O5 -qarch=pwr4 -qtune=pwr4 becuase this last set is allowed to use an extra instruction that is very fast on floating point operations (FMA, fused multuply add). NOTE If you compile on a machine with AIX 5, it will not run on a machine with AIX 4. However, if you compile on AIX 4, the binary with run on AIX 5. The libraries are optimized for different architectures and should be used with these optimizations by specifying the names as follows
OpenMP programsConsider a simple program using OpenMP directives for achieving parallel execution on shared memory machines. (The source is reproduced at the end of this section). It can be compiled for the IBM with
To compile it for the SUN SPARC the command is
Optimization and linking in libraries is the same as for serial programs. To link to the POSIX libraries from C or C++ use the link flag -lpthread and specify the "reentrant version of the compilers xlc_r, xlC_r on the IBM. The example also shows how to time parallel programs: use real time (wall clock time) instead of CPU time. program openmp integer, parameter :: n = 100, q = 100 real*8, dimension(n,n) :: a,b,c integer :: i,j,k,p integer :: ic,ir real*8 :: time0 = 0.d0, time1 = 0.d0 real*8 :: rtc, time10 = 0.d0, time11 = 0.d0 do j=1,n do i=1,n a(i,j) = float(i)*float(j) b(i,j) = sqrt(float(i+j)) c(i,j) = 0.d0 end do end do call system_clock(count=ic,count_rate=ir) if (ir /= 0) then time0 = dfloat(ic)/dfloat(ir) else print *,'No time available.' end if time10 = rtc() !$OMP parallel do shared (a,b) private (i,j,k,p) do j=1,n do p=1,q do i=1,n do k=1,n c(i,j) = c(i,j) + a(i,k)*c(k,j) end do end do end do end do !$OMP end parallel do call system_clock(count=ic,count_rate=ir) time1 = dfloat(ic)/dfloat(ir) time11 = rtc() print *,' Time ',time1-time0 print *,' Time ',time11-time10 print *,' Speed ',dfloat(2*n*n*n*q)/ (time11-time10)/dfloat(1000000),' Mflops' stop end program openmpBack to the top. MPI programsConsider a simpel program that uses the Message Passing Interface standard to achieve parallel execution on a system of compute nodes connected by a communication device. (The source is reproduced at the end of this section). It can be compiled on the IBM SP with
The most important thing to do is create a file with hostnames. The default is host.list. You can specify anothe rname in the variable MP_HOSTLIST or with the flag -hostlist. The number of tasks is set in MP_PROCS or with the flag -procs. The use of the fast switch on the SP can happen in two modes, called IP mode and user mode. User mode is best for optimal performance. The mode can be specified in MP_EUILIB=us or ip or with -euilib us or ip. The execution is then started with
For C and C++ programs, change mpxlf into mpxlc or mpxlC. To compile mixed mode programs using both MPI and OpenMP or POSIX Threads, use mpxlf_r, mpxlc_r and mpxlC_r. On the SUN system the public domain MPI implementation MPICH is installed.
The compilation is done with
program mpi integer ntasks, myid, ierr include 'mpif.h' call MPI_INIT(ierr) call MPI_COMM_SIZE(MPI_COMM_WORLD, ntasks, ierr) call MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr) print *,' nr tasks =',nprocs,' my id =', myid call MPI_FINALIZE(ierr) stop end program mpi |
||||
Have
a Question? Contact us. Last Updated 12/15/07 |