Bigmem User Guide

Contents:


 

Hardware

Bigmem is a 2 node cluster of big memory servers, designed to support programs that require large memory (at least 16GB and up to 1000GB). These are the first nodes to be made available in the new Tizard supercomputer.

  • bigmem512-1   : Dell R810 with 32 processing cores (four 8-core Intel Xeon E7-4830 CPUs @ 2.13GHz), 512GB RAM and 1.7TB of local scratch disk
  • bigmem1024-1 : Dell R910 with 32 processing cores (four 8-core Intel Xeon E7-8837 CPUs @ 2.67GHz), 1024GB RAM and 3TB of local scratch disk

User home directories are mounted via a network attached storage (NAS) server.

Back to top

Software

System software

  • Redhat Linux Enterprise Server 6.2 (Santiago)  - operating system on bigmem512-1 (Dell R810)
  • Ubuntu Linux 10.0.4 - operating system on bigmem1024-1 (Dell R910)
  • Torque queue manager and Maui Scheduler - http://www.clusterresources.com/

Compilers and parallel programming libraries

  • Intel Compiler Suite
  • GNU Compiler (GCC, GFortran)
  • Java, Python, Perl
  • Native OpenMPI - library for MPI message passing parallel programming **Not currently available*

Application software

  • GeneiousServer for bioinformatics applications is available on the Dell R810
  • Most of the applications available on the other eRSA supercomputers are installed on the Dell R910
  • Please run the "module avail' command from your ssh console to view a list of available applications.

Back to top

 

Access and accounts

Time on the machine is available to researchers at any of the South Australian universities through eResearch SA. Researchers at these universities who wish to use any of eResearch SA's facilities should complete the membership form.

Anyone else who is interested in using eResearch SA's facilities should consult the Conditions of Use page to determine how best to gain access to the machine.

To log in to the big memory nodes, first log in to the Tizard head node, using an ssh client to connect to the following hostname:

tizard1.ersa.edu.au

Then log in to one of the big memory nodes using ssh from the command line:

ssh bigmem1024-1

or

ssh bigmem512-1

Back to top

Getting started

To use the cluster, please the quick started guide here: http://www.ersa.edu.au/quickstart_guide

Please read all of this User Guide before you try to run any jobs on the cluster, particularly the sections on Compiling programs and Running jobs.

If you are unsure as to how to make changes to your default environment, please contact the eResearch SA Service Desk.

Modules

eResearch SA uses "modules" as the primary way to configure the user environment to provide access to software packages. This provides much easier access to the packages on the system. The same system is used on the NCI National Facility and other supercomputer centres around Australia.

To see what modules are available to be loaded (which applications are available on the cluster), type

module avail

at the command prompt.

You can also see which modules you currently have loaded by typing

module list

To see what other Required Modules are need to use this Application by typing

module whatis

Similarly, you can unload modules using,

module unload

for example, module unload gaussian to unload the Gaussian module, removing all references to the Gaussian executable and associated runtime libraries

 

If you do not see a module listed for the application that you wish to run please contact the eResearch SA Service Desk.

Back to top

Porting programs to the cluster

Sequential programs

Sequential programs should run without change on a single processor of the cluster. You can therefore use the cluster without knowing how to write parallel programs, simply by submitting (multiple) sequential jobs.

Parallel programming

Alternatively, you can port or develop your programs using a standard parallel programming language. Programs written using Message Passing Interface (MPI), OpenMP (shared memory directives) or multiple threads (e.g. Java threads) can be compiled and run on each of the bigmem compute nodes.

MPI

You can use MPI to parallelize programs written in Fortran, C or C++. This is more difficult to program than HPF or OpenMP, but typically gives better performance. Note you can run MPI jobs only on a single node of the bigmem cluster, i.e. on a maximum of 32 cores.

For more information on MPI, you can look at this list of materials for learning MPI. There is a good online MPI Programming Course from Edinburgh Parallel Computing Centre. A standard reference book is Using MPI: Portable Parallel Programming with the Message-Passing Interface, by William Gropp, Ewing Lusk and Anthony Skjellum, MIT Press, 1994. More information is available in the Documentation section of this User Guide.

Parallel scientific software libraries

For some programs, the majority of the time is taken up in standard routines such as matrix solve, FFT, or computing eigenvalues. In that case, it is possible to use libraries containing parallel versions of these routines, which should speed up your program without requiring you to write any parallel code.

Bigmem cluster has the Intel Cluster Math Kernel Library (which includes an optimised LAPACK library) installed, as well as FFTW. Other open source libraries may be installed on request.

Standard software packages

Many standard software packages have parallel versions of the software available. The Software section of this User Guide lists some parallel programs that have been installed. Please contact the eResearch SA Service Desk if you would like other packages installed.

Help with parallel program development

Back to top

Compiling programs

The following compilers are available. They are easily accessible once you have loaded the correct module (refer to earlier section for description of modules).

GNU compilers

  • gcc (also aliased to cc) for C and C++ programs.
  • gfortran for Fortran 95 programs (and a certain compatibility level for Fortran 77 and Fortran 90).

Intel compilers

  • icc and icpc for C and C++ programs respectively.
  • ifort for Fortran programs.

Note: You may find that some programs will only compile, or will run faster, using certain compilers, so you may want to try them all.

Check the man pages and the Documentation section of this User Guide for details on usage and options for each compiler.

MPI programs

MPI programs should be compiled using mpicc (for C programs), mpiCC (C++), mpif77 (Fortran 77) or mpif90 (Fortran 90). To enable these commands you need to load the OpenMPI module (module load openmpi). This will use the underlying Compiler Suite that has been loaded earlier (gnu or intel) to select the appropriate OpenMPI package, ie. you need to select your compiler suite before you load the OpenMPI module.

Use the which command to check you are getting the right version of the MPI compilers. For example, when using the Intel compiler Suite:

which mpicc

should return something like:

 /usr/bin/mpicc

If you have Fortran code that makes use of temporary arrays you may find that it exceeds the stack space available and will cause your job to fail. If this happens, try using the -heap-arrays option during the compile (Intel compiler specific).

OpenMP programs

OpenMP directives for shared memory parallel programming are supported. These programs are only able to run on a single compute node (i.e. up to 32 processing cores on the bigmem nodes).

General tips and information

All of the compilers will produce much faster code if you use compiler optimisation flags. Check the Documentation or man pages of the compiler you are using to find the appropriate optimisation flags (normally -O1, -O2, etc.).

Note: You should NOT use these optimisation flags when developing and debugging programs. They should only be used once you have checked that the program works, and you want to make it run faster. This is because it may take substantially longer to compile the program at a higher optimisation level.

Also, there is a greater chance of finding compiler problems or bugs at higher optimisation levels. The compiler may not be able to compile the program, or the output of the program may be incorrect. It is a good idea to check that the results of your programs compiled with a high optimisation level are the same as those with the default optimisation. If you detect an error when using a high optimisation level, try compiling that routine or program again at a lower optimisation level.

Note: Programs should only be compiled on the front end (the head node) of the cluster.

Back to top

Running jobs

Jobs are run on the bigmem machines are run by submitting a jobscript to the queuing system. The same Torque queueing system is used on all eRSA supercomputers.

First you need to log in to the particular machine you want to run the job on. First log in to the Tizard head node, using an ssh client to connect to the following hostname:

tizard1.ersa.edu.au

Then log in to one of the big memory nodes using ssh from the command line:

ssh bigmem1024-1

or

ssh bigmem512-1

Jobs are submitted to the queue by issuing the command:

qsub myscript

where myscript contains relevant Torque commands and shell script commands.

Below are some generic examples of scripts with brief descriptions of each of the various Torque components. These may be adapted to suit your needs. Please note that you only need change those bits shown in red in order to get a functioning jobscript for Torque:

Example Torque PBS jobscript for a  job

#!/bin/csh

### Job name
#PBS -N MyJobName

### Join queuing system output and error files into a single output file
#PBS -j oe

### Send email to user when job ends or aborts
#PBS -m ae

### email address for user
#PBS -M Your-email-Address

### Queue name that job is submitted to
#PBS -q bigmem

### Request nodes, memory, walltime. NB THESE ARE REQUIRED
#PBS -l nodes=1:ppn=P
#PBS -l mem=Xmb,vmem=Ymb
#PBS -l walltime=HH:MM:SS

# This job's working directory
echo Working directory is $PBS_O_WORKDIR
cd $PBS_O_WORKDIR
echo Running on host `hostname`
echo Time is `date`

# Load module(s) if required
module load application
# Run the executable
ApplicationExe+Arguments

Note:

  • All lines beginning with #PBS are interpreted as Torque commands directly to the queuing system.
  • Output and error messages will be joined into a file that will be called something like MyJobName.oXXXXX in the directory from which the job is submitted (XXXXX will be the number component of the job id which is allocated when you submit the job with qsub).
  • MyJobName should be a concise but identifiable alphanumeric name for the job (starting with a letter, NOT a number).
  • mem=Xmb,vmem=Ymb states that the program will use at most X MB of memory and Y MB of virtual memory across all CPUs during its runtime. Memory per node must be less than 512gb (on bigmem512-1) or 1024gb (on bigmem1024-1). Normally you can set vmem as being about twice the mem but it may need to be more than that in some cases.
  • P is the number of processors (or CPUs, or cores) on each node that you want to execute your job on. It should be at most 32 (the number of cores on each bigmem node and preferably less (or else you are using the entire node, so no one else can use it.
  • module load application is required if you don't automatically load this module in this shell's environment.
  • ApplicationExe+Arguments is the name of the program you want to run and all of the command line arguments you need. It may also include redirection of input and output streams.

A copy of this sample script can be downloaded here.

Checking a job's status in the queue

Once a job has been submitted to Torque using qsub, it will print out a Job ID of the form XXXXX.bigmem1024-1.ersa.edu.au where XXXXX is a decimal number. This number is helpful to make checks on the job's status using the qstat command. Here is some sample output:

qstat -a 109
bigmem1024-1.tizard.ersa.edu.au:
                                                                         Req'd  Req'd   Elap
Job ID               Username Queue    Jobname          SessID NDS   TSK Memory Time  S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
109.
bigmem1024-1     auser    bigmem   testjob          24676    2    2  16000m 300:0 R 264:0

qstat -a 241
 

To see all running jobs (including other users'), run the command qstat -r (the S column is the state, R means running and S indicates the job has been suspended to allow another job to run). A qstat -a will show these and also the jobs that are queued (not running, signified by state Q).

Deleting a queued job

To delete a queued or running job type

qdel job.id

where the job.id is the numerical portion of the output of qstat

Note: You will only be able to delete your own jobs.

For further information on the Torque commands it is highly recommended that you read the manual pages for qsub, qstat, qdel.

Back to top

Data storage and back-up

Temporary storage during computation

If your job requires local temporary space during execution, it is recommended that you use the shared scratch area /scratch.
Please contact the servicedesk is you need assistance accessing the /scratch directory.

Long term storage

Please see the storage FAQ for details here

Back to top

Documentation

Compilers

Message passing interface (MPI)

Fortran and High Performance Fortran (HPF)

Back to top

Contacts and help

For more information on eResearch SA's facilities, systems support, assistance with parallel programming and performance optimisation and to report any problems, contact the eResearch SA Service Desk.

When reporting problems, please give as much information as you can to help us in diagnosis, for example:

  • When the problem occurred
  • What commands or programs you were trying to execute at the time
  • A copy of any error messages
  • A pointer to the program you were trying to run or compile
  • What compiler or Makefile you were using

Back to top