Corvus user guide

Contents:


 

Hardware

Corvus is an SGI Altix XE 1300 cluster with 70 nodes. A Voltaire Infiniband DDR 4x (20Gbps) switch connects all of the nodes as well as the network attached storage (NAS) server. 68 of the nodes are provided by 34 SGI Altix XE310 chassis which comprise of two nodes with a shared power supply. Each of these nodes has two 2.66 GHz Intel Clovertown quad core processors, 8GB RAM and a 250GB SATA drive. The remaining two nodes are SGI Altix XE240, each with two 2.66 GHz Intel Clovertown quad core processors, 16GB RAM and 146GB SAS drive. One of these is the cluster head node.

Corvus has access to a pool of storage which is shared amongst several of eResearch SA's facilities. Some of this is provided from a dedicated storage node. Another node is used as the front end (or host) for the cluster, for compiling code, testing programs and the submission of jobs.

In addition, each of the compute nodes on Corvus has 200GB of local disk space for storing data and other files whilst a job is running. These files will be removed once a job has terminated.

Back to top

Software

System software

Compilers and parallel programming libraries

  • Intel Compiler Suite
  • GNU Compiler (GCC, GFortran)
  • Java, Python, Perl, Ruby
  • OpenMPI - library for MPI message passing parallel programming over Infiniband and Ethernet

Libraries

  • Please run the "module avail' command from your ssh console to view a list of available applications.

Application software

  • Gaussian09, Guassian03, OpenFoam, Blast, Namd, Matlab and many more.
  • Please run the "module avail' command from your ssh console to view a list of available applications.

Back to top

 

Access and accounts

Time on the machine is available to researchers at any of the South Australian universities through eResearch SA. Researchers at these universities who wish to use any of eResearch SA's facilities should complete the membership form.

Anyone else who is interested in using eResearch SA's facilities should consult the Conditions of Use page to determine how best to gain access to the machine.

Back to top

Getting started

To use the cluster, please the quick started guide here: http://www.ersa.edu.au/quickstart_guide

Please read all of this User Guide before you try to run any jobs on the cluster, particularly the sections on Compiling programs and Running jobs.

If you are unsure as to how to make changes to your default environment, please contact the eResearch SA Service Desk.

Modules

eResearch SA has embarked on using "modules" as the primary way to configure the user environment to provide access to software packages. This provides much easier access to the packages on the system. Researchers who have used APAC's HPC systems will have already had some exposure to this more dynamic mechanism for gaining access to software.

To see what modules are available to be loaded (which applications are available on the cluster), type

module avail

at the command prompt.

You can also see which modules you currently have loaded by typing

module list

Similarly, you can unload modules using, for example, module unload gaussian to unload the Gaussian module, removing all references to the Gaussian executable and associated runtime libraries

If you do not see a module listed for the application that you wish to run please contact the eResearch SA Service Desk.

Back to top

Porting programs to the cluster

Sequential programs

Sequential programs should run without change on a single processor of the cluster. You can therefore use the cluster without knowing how to write parallel programs, simply by submitting (multiple) sequential jobs.

Parallel programming

Alternatively, you can port or develop your programs using a standard parallel programming language. Programs written using Message Passing Interface (MPI), or OpenMP (shared memory directives) can be compiled and run on the cluster. OpenMP programs can only be run on one node (up to 8 processors) since they use shared memory. MPI jobs can be run on any number of processors up to the maximum available in the cluster at the time.

MPI

You can use MPI to parallelize programs written in Fortran, C or C++. This is more difficult to program than HPF or OpenMP, but typically gives better performance. For more information on MPI, you can look at this list of materials for learning MPI. There is a good online MPI Programming Course from Edinburgh Parallel Computing Centre. A standard reference book is Using MPI: Portable Parallel Programming with the Message-Passing Interface, by William Gropp, Ewing Lusk and Anthony Skjellum, MIT Press, 1994. More information is available in the Documentation section of this User Guide.

Parallel scientific software libraries

For some programs, the majority of the time is taken up in standard routines such as matrix solve, FFT, or computing eigenvalues. In that case, it is possible to use libraries containing parallel versions of these routines, which should speed up your program without requiring you to write any parallel code.

Corvus has the Intel Cluster Math Kernel Library (which includes an optimised LAPACK library) installed, as well as FFTW. Other open source libraries may be installed on request.

Standard software packages

Many standard software packages have parallel versions of the software available. The Software section of this User Guide lists some parallel programs that have been installed. Please contact the eResearch SA Service Desk if you would like other packages installed.

Help with parallel program development

Back to top

Compiling programs

The following compilers are available on Corvus. They are easily accessible once you have loaded the correct module (refer to earlier section for description of modules).

GNU compilers

  • gcc (also aliased to cc) for C and C++ programs.
  • gfortran for Fortran 95 programs (and a certain compatibility level for Fortran 77 and Fortran 90).

Intel compilers

  • icc and icpc for C and C++ programs respectively.
  • ifort for Fortran programs.

Note: You may find that some programs will only compile, or will run faster, using certain compilers, so you may want to try them all.

Check the man pages and the Documentation section of this User Guide for details on usage and options for each compiler.

MPI programs

MPI programs should be compiled using mpicc (for C programs), mpiCC (C++), mpif77 (Fortran 77) or mpif90 (Fortran 90). To enable these commands you need to load the OpenMPI module (module load openmpi). This will use the underlying Compiler Suite that has been loaded earlier (gnu or intel) to select the appropriate OpenMPI package, ie. you need to select your compiler suite before you load the OpenMPI module.

Use the which command to check you are getting the right version of the MPI compilers. For example, when using the Intel compiler Suite:

which mpicc

should return something like:

/opt/shared/openmpi/1.2.6/intel/bin/mpicc

If you have Fortran code that makes use of temporary arrays you may find that it exceeds the stack space available and will cause your job to fail. If this happens, try using the -heap-arrays option during the compile (Intel compiler specific).

OpenMP programs

OpenMP directives for shared memory parallel programming are supported. These programs will only be able to run on a single node of Corvus (i.e. up to 8 processors).

General tips and information

All of the compilers will produce much faster code if you use compiler optimisation flags. Check the Documentation or man pages of the compiler you are using to find the appropriate optimisation flags (normally -O1, -O2, etc.).

Note: You should NOT use these optimisation flags when developing and debugging programs. They should only be used once you have checked that the program works, and you want to make it run faster. This is because it may take substantially longer to compile the program at a higher optimisation level.

Also, there is a greater chance of finding compiler problems or bugs at higher optimisation levels. The compiler may not be able to compile the program, or the output of the program may be incorrect. It is a good idea to check that the results of your programs compiled with a high optimisation level are the same as those with the default optimisation. If you detect an error when using a high optimisation level, try compiling that routine or program again at a lower optimisation level.

Note: Programs should only be compiled on the front end (the head node) of the cluster.

Back to top

Running jobs

Jobs are run on Corvus by submitting a jobscript to the queuing system. Corvus uses the ANU implementation of the Torque queueing system.

Jobs are submitted to the queue by issuing the command:

qsub myscript

where myscript contains relevant Torque commands and shell script commands.

Below are some generic examples of scripts with brief descriptions of each of the various Torque components. These may be adapted to suit your needs. Please note that you only need change those bits shown in red [this markup is still to be done] in order to get a functioning jobscript for Torque:

Note: Corvus has been upgraded to SUSE Enterprise Linux 11 : Please see the Corvus Upgrade Notes for changes

Sample Torque jobscript for a sequential job

#!/bin/csh

### Job name
#PBS -N MyJobName

### Join queuing system output and error files into a single output file
#PBS -j oe

### Send email to user when job ends or aborts
#PBS -m ae

### email address for user
#PBS -M Your-email-Address

### Queue name that job is submitted to
#PBS -q corvus

### Request nodes, memory, walltime. NB THESE ARE REQUIRED
#PBS -l nodes=1:ppn=1
#PBS -l mem=Xmb,vmem=Ymb
#PBS -l walltime=HH:MM:SS

# This job's working directory
echo Working directory is $PBS_O_WORKDIR
cd $PBS_O_WORKDIR
echo Running on host `hostname`
echo Time is `date`

# Load module(s) if required
module load application
# Run the executable
ApplicationExe+Arguments

Note:

  • All lines beginning with #PBS are interpreted as Torque commands directly to the queuing system.
  • Output and error messages will be joined into a file that will be called something like MyJobName.oXXXXX in the directory from which the job is submitted (XXXXX will be the number component of the job id which is allocated when you submit the job with qsub).
  • MyJobName should be a concise but identifiable alphanumeric name for the job (starting with a letter, NOT a number).
  • mem=Xmb,vmem=Xmb states that the program will use at most X MB of memory and virtual memory during its runtime.
  • module load application is required if you don't automatically load this module in this shell's environment.
  • ApplicationExe+Arguments is the name of the program you want to run and all of the command line arguments you need. It may also include redirection of input and output streams.

A copy of this sample script can be downloaded here.

Sample Torque jobscript for an OpenMPI job

#!/bin/csh

### Job name
#PBS -N MyMPIJobName

### Output files
#PBS -j oe

### Mail to user when job ends or aborts
#PBS -m ae

### Mail address for user
#PBS -M Your-email-Address

### Queue name
#PBS -q corvus

### Request nodes, memory, walltime. NB THESE ARE REQUIRED
#PBS -l nodes=N:ppn=8
#PBS -l mem=Xmb,vmem=Xmb
#PBS -l walltime=HH:MM:SS
# This job's working directory
echo Working directory is $PBS_O_WORKDIR
cd $PBS_O_WORKDIR
echo Running on host `hostname`
echo Time is `date`
echo Using nodes
cat $PBS_NODEFILE

# Load module(s) if required
module load Compiler
module load openmpi
module load application
# Run the executable
mpirun ApplicationExe+Arguments

Note:

  • All lines beginning with #PBS are interpreted as Torque commands directly to the queuing system.
  • Output and error messages will be joined into a file that will be called something like MyMPIJobName.oXXXXX in the directory from which the job is submitted (XXXXX will be the number component of the job id which is allocated when you submit the job with qsub).
  • MyMPIJobName should be a concise but identifiable alphanumeric name for the job (starting with a letter, NOT a number).
  • nodes=N:ppn=8, requests the number of nodes required for a job. By default if you don't specify ppn=8, torque only assigned you 1 CPU per node!
  • If you require 16 CPUs you would use nodes=2:ppn=8
  • The ppn=8 is very important if you want all the CPUS on a node!
  • mem=Xmb,vmem=Xmb states that the program will use at most X MB of memory and virtual memory across all cpus during its runtime. (Example: 8000mb)
  • module load Compiler is required so that your job will use the correct openmpi. Set Compiler to the name of the compiler suite used to build the application (probably intel or gnu).
  • module load application is required if you don't automatically load this module in this shell's environment.
  • ApplicationExe+Arguments is the name of the program you want to run and all of the command line arguments you need. It may also include redirection of input and output streams.

A copy of this sample script can be downloaded here.

Checking a job's status in the queue

Once a job has been submitted to Torque using qsub, it will print out a Job ID of the form XXXXX.corvus.www.ersa.edu.au where XXXXX is a decimal number. This number is helpful to make checks on the job's status using the qstat command. Here is some sample output:

qstat -a 109596
corvus.ersa.edu.au:
                                                                         Req'd  Req'd   Elap
Job ID               Username Queue    Jobname          SessID NDS   TSK Memory Time  S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
7674.corvus.ersa     auser    corvus   testjob          24676    2    2  16000m 300:0 R 264:0

To see all running jobs (including other users'), run the command qstat -r (the S column is the state, R means running and S indicates the job has been suspended to allow another job to run). A qstat -a will show these and also the jobs that are queued (not running, signified by state Q).

Deleting a queued job

To delete a queued or running job type

qdel job.id

where the job.id is the numerical portion of the output of qstat

Note: You will only be able to delete your own jobs.

How much memory and virtual memory will I need?

  • Please refer to your software documentation on how much memory it will require. The amount of memory you need may depend on how many CPU cores your software uses.
  • If needed you can run a test job in the short queue on Corvus (max 5 hours walltime) and check resources_used.mem and resources_used.vmem in the output file that torque generates.
  • This file is located in the same folder as your qsub submission script and is named YourJobName.oJobID. For example: MyFirstJob.o128099

Short queue

Corvus has a short queue to run short jobs that don't require lots of CPU, Memory or Walltime. These jobs run on dedicated nodes so your job may not have to stay queued in the main queue.

These are the maximum resources you are able to use on the Corvus short queue:

  • 4GB Physical Memory
  • Maximum 2 CPUs
  • 5 hours walltime
  • Maximum 20 jobs per user

To use the short queue change this line in your qsub file from this:

#PBS -q corvus

to this:

#PBS -q short

Quick test queue

Corvus also has a quick test queue to run test jobs on. These jobs run in the same nodes as the main queue jobs however they have a higher priority so your jobs will start quicker.

These are the maximum resources you are able to use on the Corvus quick test queue:

  • 1GB Physical Memory
  • 1GB Virtual Memory
  • Maximum 1 CPU
  • Maximum 20 jobs per user

To use the short queue change this line in your qsub file from this:

#PBS -q corvus

to this:

#PBS -q quicktest

For further information on the Torque commands it is highly recommended that you read the manual pages for qsub, qstat, qdel.

Back to top

Data storage and back-up

Temporary storage during computation

If your job requires local temporary space during execution, it is recommended that you use the jobfs directory on the nodes or the shared scratch area.
Please contact the servicedesk is you need assistance accessing the /jobfs or /scratch directory.

Long term storage

Please see the storage FAQ for details here

Back to top

Documentation

Compilers

Message passing interface (MPI)

Fortran and High Performance Fortran (HPF)

Back to top

Contacts and help

For more information on eResearch SA's facilities, systems support, assistance with parallel programming and performance optimisation and to report any problems, contact the eResearch SA Service Desk.

When reporting problems, please give as much information as you can to help us in diagnosis, for example:

  • When the problem occurred
  • What commands or programs you were trying to execute at the time
  • A copy of any error messages
  • A pointer to the program you were trying to run or compile
  • What compiler or Makefile you were using

Back to top