Getting Started on Emerald
Table of Contents
Introduction
This document describes how to use the Research Computing cluster called Emerald. The intended use of this cluster is for UNC-CH affiliated researchers to do research-related computing.
System Information
Research Computing manages a heterogeneous cluster of multi-core CPU hosts, collectively known as the Emerald cluster, for campus researchers. Most nodes are based on Red Hat Enterprise Linux 4.0 (32-bit) or Red Hat Linux v5 (64-bit), but the cluster also includes four Power5-based large memory hosts, which run the AIX operating system. The Linux compute nodes are Intel Xeon IBM BladeCenter nodes (1.8, 2.0, 2.4, 2.8, and 3.2 GHz), and they communicate via a 10-Gigabit Ethernet network. Job management is handled by LSF (Load Sharing Facility). While working on Emerald, you will have access to several shared scratch file systems described later in this document.
Logging In
To obtain and/or manage your account on our servers, please visit the Onyen Services page, click on the Subscribe to Services button and select Emerald Cluster. Once you have an Emerald account, you can login using Secure Shell (ssh) to connect to Emerald:
ssh emerald.unc.edu
Telnet access is not allowed. Even though the cluster has many compute nodes, you never actually login to any of them. Instead, you login as above to the cluster. Successfully logging in takes you to "login node" resources that have been set aside for user access. From here you edit and compile your code, then use the LSF job scheduler to submit your code to the compute nodes for processing. When you login to Emerald, your home directory will be in AFS space rather than being local to Emerald.
LSF jobs that you run on Emerald will not have access to files in your AFS home directory nor to any other AFS file space that requires an AFS token. It is suggested that you use scratch space, described below, for your work files.
Work/Scratch Space
1. GPFS (General Parallel File System) scratch space
Scratch space (temporary storage for files associated with jobs you are currently running) is provided as a shared resource for all users. As of December 2008, additional scratch space, based on GPFS, is available on all Emerald Linux and AIX nodes. This space should be used as your primary working directory now. The scratch space is shared temporary storage intended for work files used in job processes. It uses standard UNIX permissions to control access to files and directories. (AFS is a separate file system which is also mounted on Emerald for legacy purposes, and access permissions there are governed by AFS permissions, known as ACLs). By default other users in your Unix group (graduate, faculty, etc) have read access to your scratch directory. You can easily remove their read access with the “chmod” command.
There are two directories in the GPFS-based scratch space: /smallfs and /largefs.
/smallfs has 15Tb of scratch space intended for research data files smaller than 1 Megabyte and /largefs has 18Tb of scratch space intended for research data files larger than 1 Megabyte . These GPFS file systems are not available on other Research Computing clusters or systems.
To access your directory on GPFS space, use the commands:
cd /smallfs/[onyen]
or
cd /largefs/[onyen]
You can access this space from Emerald, and your jobs running on compute nodes will also be able to access this space. This is the directory where you do your work.
By default other users in your group (graduate, faculty, etc) have read access to your GPFS scratch directory. You can remove this access with the “chmod” command. To see how much scratch file space Emerald has access to, use the “df” command:
df -h /smallfs/
or
df –h /largefs/
To see how much space your files are taking up in your GPFS directory use the “du” command:
du -h /smallfs/[onyen]
or
du -h /largefs/[onyen]
If you have multiple subdirectories and you just want to see a summarization use the “-s” option:
du -hs /smallfs/[onyen]
or
du -hs /largefs/[onyen]
Please note that scratch space is not backed up and is, therefore, not intended for permanent data storage. See the "Mass Storage" section below about archiving permanent data.
Since this storage space is shared by many other users, please remove any files there that are not associated with currently running jobs. A policy has been established for cleaning out files from /smallfs and /largefs. Scratch file deletion will be enforced with files older than 21 days being removed. Beginning January 12, 2009, we implemented a 21-day automated scratch file removal. Any file not used or modified in the last 21 days will be deleted. Having an automated deletion policy and process is necessary to ensure that this limited and shared resource is available for all to use. Scratch space is not intended for long-term storage. Without an automated clean up procedure in place, the file system would routinely fill up, causing many users' jobs to suspend or fail.
2. Netscratch space
For many years, the “/netscr” file system served as the working scratch directory for Emerald. With the implementation of the GPFS file systems for scratch, you should routinely use “/smallfs” or “/largefs” for your work – they have more capacity and job performance will be better. For the near term, “/netscr” will continue to be mounted and will remain the default LSF work space.
Netscratch space is a shared temporary work directory space intended for work files used in job processes. Scratch space uses standard UNIX permissions to control access to files and directories. (AFS is a separate file system which is also mounted on Emerald and access permissions are mostly controlled by AFS permissions, known as ACLs). By default other users in your group (graduate students, faculty, employees) have read access to your Netscratch directory. You can easily remove this read permission with the “chmod” command. To see how much scratch file space Emerald has access to, use the “df” command:
df -h /netscr/
To see how much space your files are taking up in your NetScratch directory use the “du” command:
du -h /netscr/[onyen]
If you have multiple subdirectories and you just want to see a summarization use the “-s” option:
du -hs /netscr/[onyen]
Please note that scratch space is not backed up and is, therefore, not intended for permanent data storage. See the "Mass Storage" section below to archive store permanent data.
Scratch space is an NFS-mounted file system and thus shared by all Emerald users as well as the users of other Research Computing systems.
After your account has been created the first time you ssh into Emerald, using any ssh client other than X-Win 32's StarNet SSH client, your scratch directory will be created:
/netscr/[onyen]
For example:
/netscr/mason
would be the directory of the person whose Onyen was “mason” where your scratch directory name for work and scratch files is /netscr/[onyen]. You can access this space from Emerald, and your jobs running on compute nodes will also be able to access this space. This is the directory where you do your work.
Since this storage is shared with many other users, please remove any files there that are not associated with currently running jobs. A policy has been established for cleaning out files from /netscr. Scratch file deletion will be enforced with files older than 21 days being removed. Beginning January 12, 2009, we implemented a 21-day automated scratch file removal. Any file not used or modified in the last 21 days will be deleted. This policy is necessary to ensure that all users have access to this limited and shared storage resource. It is not intended for long-term storage. Without an automated clean up procedure in place, the file system would routinely fill up, causing users' jobs to suspend or fail.
Mass Storage
The Mass Storage system (also known as SAM-FS or /ms) is intended for archiving files and storing very large files, files that are too large to fit in your AFS quota. Files located in mass storage are not accessible to jobs running in LSF. Mass storage is not to be used as a work directory or as a backup location for local disk drives, operating systems, or software. In general, files that change often or directories with more than a thousand files in them will cause performance problems and consume tape resources. The Iron Mountain PC backup software provided by UNC might be an alternative solution rather than having to copy your PC files to mass storage.
Mass Storage is similar to an ordinary disk file system in that it keeps an inode (for recording data location, etc.) and data blocks for each file. For the user of mass storage, this file system appears to be a subdirectory of the user's AFS home directory. Files can be moved in and out of mass storage by using simple UNIX commands such as “cp” and “mv” or by using sftp/scp. As the Mass Storage system is optimized for archiving data, your programs should not directly read or write from the Mass Storage system. Instead copy your data from “~/ms” to “/largefs/[onyen]” or “/smallfs/[onyen]”.
If you are routinely storing large numbers of small files (more than several hundred files at a time) in mass storage, you should “tar” or “zip” those smaller files into one tarball or zip file outside of mass storage and then move that tarball or zip file to mass storage. You are not required to compress the tarball or zip file since the mass storage tape drive hardware will compress your data. Reducing the number of individual small files will help the overall performance of the SAM-FS Mass Storage system. See the more detailed list of things to avoid.
To access Mass Storage from Emerald, type:
cd ~/ms
Any files in the scratch space that you wish to save, can be moved to the mass storage preferably in tar or zip format.
If you are currently doing any large moves or copies of data (as to or from mass storage) we hope you will use the LSF command:
bsub -R ms cp /netscr/myonyen/output/* /ms/home/m/y/myonyen/saved_output
This bsub command, issued with the "-R ms" parameters, will submit your copy or move job to a host with very good connectivity to the mass storage system. We expect these hosts to handle multiple data moves well, removing this burden from the login nodes.
Software
The Emerald Linux cluster mounts software applications from the AFS file system. This provides you access to many scientific, statistical and mathematical software packages. Among the more popular applications are SAS, NWChem, Amber, and Matlab. Several compilers are also available for use on the cluster, including Fortran compilers from Intel, Absoft and the Portland Group.
Though software applications are made available from AFS space, your AFS home directory will not be available to either read or write from a job you submit via LSF, even interactive LSF jobs. Any files you want your job to read or write should be in “/largefs/[onyen]” or “/smallfs/[onyen]”.
Many software applications are installed in AFS; but most are not part of your default working environment. To access a particular software application, you will need to add it to your environment with the ipm command. For example, to add Portland Group compilers to your environment, you would execute the command:
ipm add pgi
A subset of the most frequently used utilities and applications has already been added to your working environment. This default set of tools includes a number of editors, including vi, ne, pico, nedit and emacs.
One note of caution for you as you add applications for use. Each package added will increase the length of your $PATH session parameter; if this gets too long, parts of it will be lost and some commands will fail to execute as you expect. If this situation arises, you will need to remove some packages from your environment:
ipm remove package_name
We would recommend that you remove some of the less frequently used packages. The command:
ipm query
will list the packages that are currently in your environment at the very end of the list. As noted above, you can read about ipm in more detail.
To use an X-Win 32 StarNetSSH session to connect to emerald.unc.edu you need to set the location of your ".Xauthority" file in the Command window during the configuration of the StarNetSSH session. Read the section "Creating sessions for Research Computing hosts using the Session Wizard" on the help page for X-Win 32.
Compiling Serial Codes
There are four commonly used compilers, namely, GNU, Intel, Portland Group and Absoft Profortran. The following table lists the coresponding package name and compiler command for FORTRAN 77/90 and C/C++ programming languages:
Table 1. Available Compilers
|
Compiler |
Package Name |
Programming Language Command | |||
|
FORTRAN 77 |
FORTRAN 90/95 |
C |
C++ | ||
|
GNU |
gcc |
g77 |
--- |
gcc |
g++ |
|
Intel |
intel_fortran intel_CC |
ifc |
ifc |
icc |
icc |
|
PGI |
pgi |
pgf77 |
pgf90 |
pgcc |
pgCC |
|
Absoft |
profortran |
f77 |
f90 |
--- |
--- |
To subscribe to one compiler package such as pgi, type:
ipm add pgi
After the compiler package has been added, to compile your serial FORTRAN 77 code, for example, “source.f”, type:
pgf77 -O -o source.x source.f
An executable, “source.x”, is then generated.
Experience shows that among the four compilers, the Intel compiler is the best. While we caution that performance of compilers is code-dependent, we encourage use of the Intel compilers on Emerald.
Compiling Parallel Codes
MPI parallel codes in FORTRAN 77/90/95 and C/C++ can be run on the distributed-memory environment of the cluster. To compile your MPI codes, you need to pick a compiler (Intel, PGI, Absoft or GNU) and a kind of machines/CPUs on which your code will run. Possible combinations are tabulated in Table 2 below:
Table 2. Available Compilers and Packages to be added for each kind of compiler and CPU
|
Fortran77 |
FORTRAN90 |
C |
C++ | ||
|
MPI Command |
mpif77 |
mpif90 |
mpicc |
mpiCC | |
|
Intel Blade CPU |
Intel |
intel_fortran mpich |
intel_fortran mpich |
intel_CC mpich |
intel_CC mpich |
|
PGI |
pgi mpich |
pgi mpich |
pgi mpich |
pgi mpich | |
|
GNU |
gcc mpich |
--- |
gcc mpich_gm |
gcc mpich_gm | |
|
Absoft |
profortran mpich |
profortran mpich |
--- |
--- | |
Notice that the order that packages are ipm added is important. Add the compiler first and then the MPICH package. For example, to compile MPI FORTRAN 77 codes with the Intel compiler for IBM Blade CPUs, type:
ipm add intel_fortran mpich
To compile your code, say, “mpi_source.f”, type:
mpif77 -O -o mpi_source.x mpi_source.f
An executable named “mpi_source.x” is generated after compilation.
Submitting Jobs
Once you have decided what software you need to use, added those packages to your environment using ipm (if needed), and you have successfully compiled your serial or MPI parallel code, you can then submit your jobs to run on Emerald. We use LSF (Load Sharing Facility) software to schedule and manage jobs that are submitted to run on Emerald. Emerald has 4 types of CPUs and many processors, and each processor (or “core”) is known as a job slot in LSF. A job slot is the basic unit of processor allocation in LSF. A serial job uses one job slot; a parallel job requesting N processors would use N job slots. Each user can have up to 60 job slots in use at any one time. If you are already using 60 job slots and you submit a job to run, that job will PEND until job slots are freed as your running jobs finish. Similarly, if all the job slots in the cluster are in use when you submit a job, even if you are not using any job slots yourself, your job will PEND.
To submit a job to run, you will need to use the LSF “bsub” command as shown below. LSF submits jobs to particular job queues you specify. So in your “bsub” command, you will need to specify the queue in which the job is to run and the kind of machines/CPUs on which it will run. Different queues have different run time limits including, in some cases, limits on the total slots per user. See Table 3 for details.
Table 3. Available Queues on Emerald
|
Queue Name |
Run time limit |
Slot limit |
Preemption |
|
now |
5 minutes |
2 per user |
Preempts month |
|
int |
10 hours |
2 per user, 25 total |
No |
|
week (default queue) |
7 days |
32 per user |
No |
|
month |
30 days |
4 per user, 32 total |
By the now queue |
|
patrons (restricted to patrons only) |
unlimited |
Depends on group |
Preempts idle queue |
|
idle |
unlimited |
unlimited |
By the patrons queue |
Patrons can run interactive jobs in the patron queue but the “bsub” option “-Ip” needs to be used:
bsub -Ip -q patrons some_executable
There are different kinds of CPUs that run Linux in the cluster, Xeon 1.8Ghz (xeon18), Xeon 2.0Ghz (xeon20), 2.4GHz (xeon24), Xeon 2.8 GHz (xeon28), and Xeon 3.2 GHz (xeon32). The Xeon CPUs are IBM Blades (blade) connect with Gigabit Ethernet. Use “–R” to select what kind of CPUs your job will run on.
A list of resources defined for a given node can be seen in the last column of output of the following command:
lshosts | more
The basic syntax for submitting a serial job is:
bsub -q queuename -R resources executable options_for_job
For example:
bsub -q week -R RH4 my_executable
There are both 32-bit and 64-bit machines running on Emerald. Some applications need to be run on either a 32-bit or 64-bit machine. "RH4" specifies that your job will be submitted to a 32-bit machine. Likewise to submit your job to a 64-bit machine or to specify that you do not want your job to be submitted to a large memory resource ( IBM P575), use the resource names "RH5" and "blade" respectively.
Since the “week” queue is the default queue, it does not have to be specified. So this “bsub” submission is the same as the above:
bsub -R blade my_executable
You can raise the priority of your week queue job by estimating how much time you think your job will really run. This is really beneficial if you think your job will take only a day or less or perhaps an hour or less. The “-W” option allows you to basically create your own day queue, hour queue or whatever time frame that is less than a week:
# to run a job with a run time limit of 24 hours and 0 minutes:
bsub -R blade -W 24:0 my_executable
# to run a job with a run time limit of 30 minutes:
bsub -R blade -W 30 my_executable
Note that jobs submitted to the interactive queue “int” will not run on Emerald login nodes, but run on a compute node instead. You will not be able to read or write to your AFS home directory during an interactive job so “cd” to “/largefs/[onyen]” or “/smallfs/[onyen]” before starting an interactive job. Submitting jobs to the interactive queue requires one additional parameter in the “bsub” command, “-Ip”, as shown below:
bsub -q int -Ip -R blade my_interactive_job
To run a parallel job using four CPUs across IBM BladeCenter nodes:
bsub -q week -n 4 -R blade -a mpichp4 mpirun.lsf my_par_job
or:
bsub -q idle -n 4 -R blade13 -a lammpi mpirun.lsf my_par_job
To run a parallel job on IBM xeon 3.2 GHz machines with, for example, 4 CPUs:
bsub -q patrons -n 4 -R xeon32 -a lammpi mpirun.lsf my_par_job
or:
bsub -q idle -n 4 -R xeon32 -a mpichp4 mpirun.lsf my_par_job
LSF will send email to your email address when the job finishes, whether it completes successfully or not (unless you are running in the interactive queue of course). You can check the status of your submitted LSF jobs with the command “bjobs”. The output of that command will include a Job ID, the status of your job (typically “PEND” or “RUN”), the queue to which you submitted the job, the job name, and other information. Additional details can be obtained with:
bjobs -l [JobID]
If you need to kill/end a running job, use the “bkill” command:
bkill [JobID]
Where JobID is the LSF job ID displayed with the “bjobs” command.
Jobs running outside the LSF queues will be killed. The logon privileges of users who repeatedly run jobs outside of the LSF queues will be suspended.
High Memory/AIX Resources
You are be able to run jobs on any of four P575 servers and the old yatta AIX server. The new servers also run the AIX 5.3 UNIX operating system. Three of these AIX machines have 32 gigabytes of memory and one AIX machine has 64 gigabytes of memory (yatta has 128 gigabytes). Each of the four IBM P575 servers has sixteen 1.5 GHZ POWER5+ processors.
If you have code that was compiled on another server, you will need to recompile the code on Emerald as the operating system is different. If you need to run your compiled code on any of the AIX servers, you need to do the compiling in an LSF job running on one of the AIX servers on which you plan to execute your job. The operating system on the Emerald login node is different than the operating system on these compute servers. You cannot login directly to these compute servers.
Table 4. Available compilers on AIX
|
Compiler |
Programming Language Command | |||||
|
C |
C++ |
Fortran77 |
Fortran90 |
Fortran95 |
Fortran2003 | |
|
IBM XL C/C++ |
cc, xlc |
xlC |
--- |
--- |
--- |
--- |
|
IBM XL Fortran |
--- |
--- |
xlf, f77, fort77 |
xlf90, f90 |
xlf95, f95 |
xlf2003, f2003 |
|
(Parallel) |
mpcc_r |
mpCC_r |
mpxlf_r |
mpxlf90_r |
mpxlf95_r |
--- |
|
GNU |
gcc |
g++ |
--- |
--- |
--- |
--- |
Compiler Reference Manuals:
Jobs running SAS, Stata, etc. and that require large amounts of memory (more than an Emerald compute node which has up to 3 GB of accessible memory) can be run on these new AIX servers.
You can submit jobs to these servers by using the “bsub –R” resource option like so:
bsub -q week -R p5aix sas -memsize 7G -sortsize 7136M -sysin my_large_memory_job.sas
The "p5aix" is the resource name for a POWER5+ processor server running AIX UNIX. LSF will choose one of the four p575 servers.
The "p5" resource name includes the "yatta" server in addition to the p575 servers.
To run a parallel job on AIX with, for example, 4 CPUs:
bsub -q week -R p5 -n 4 -a poe poejob my_par_job
"poe" stands for the Parallel Operating Environment on AIX.
|
Machine names |
Total Amount of Memory |
|
p575-n00 |
32 GB |
|
p575-n01 |
32 GB |
|
p575-n02 |
32 GB |
|
p575-n03 |
64 GB |
|
yatta |
128 GB |
If your job needs 3 to 8 gigabytes of memory, run your job on a 32 gigabyte machine. If your job requires more than 8 gigabytes of memory, submit your job to the high memory machine. Remember that these machines share their memory between all their processors that may be being used by other users.
If you need to run your job on the machine with 64GB of memory you can submit your job using the “bsub –R” resource option specifying "mem64":
bsub -Ip -q int -R mem64 stata -m20480
The above command submits an interactive Stata job using the Stata option “-m” to request 20,480 megabytes of memory which is 20GB.
You can also submit jobs to the machine with 64GB of memory using the “bsub –m” machine name option specifying "p575-n03":
bsub -Ip -q int -m p575-n03 stata -m10240
The above command submits an interactive Stata job using the Stata option “-m” to request 10,240 megabytes of memory which is 10GB.
The following example shows how to create an LSF script file that you can redirect to the “bsub” command to submit a NWChem job. First create this text file and name the file “nwchem.lsf”:
#!/bin/sh
#BSUB -J NWChem_Job
#BSUB -q week
#BSUB -n 4
#BSUB -R p5aix
#BSUB -o %J.out
mpirun -np 4 nwchem input.nw
and then submit your job to LSF like so:
bsub < nwchem.lsf
Transferring Files
It is likely that you will need to transfer files between your campus computer systems and Emerald. You will need to use the “sftp” or “scp” command to move your files. The command “sftp” works similarly to the popular “ftp” command but is more secure. From your host UNIX/Linux or Mac computer terminal window, type:
sftp [onyen]@emerald.unc.edu
Enter your Onyen password and you will be presented with the sftp prompt. Use the “put” and “get” commands to transfer files, as you would do with standard ftp.
To use the “scp” command, follow this example to get a file, named “temp.txt”, from Emerald and store it in your local computer's “/tmp” directory with the same file name. You will be prompted for your password.
scp [onyen]@emerald.unc.edu:/largefs/[onyen]/temp.txt /tmp/
You can also copy a whole directory. The following command will recursively copy the whole directory “/tmp/temp_dir/” from your local computer to Emerald, and place it in the “/largefs/[onyen]/” directory with the name “temp_dir”:
scp -r /tmp/temp_dir/ [onyen]@emerald.unc.edu:/largefs/[onyen]/
If you need to copy any large file or the large number of files, use the LSF “bsub” command. For example, instead of executing the “cp” command directly from the login node:
cp /netscr/[onyen]/text.txt /ms/home/o/n/[onyen]/
it will be more efficient for you and other users if you submit your job using the LSF “bsub” command:
bsub -R ms cp /netscr/[onyen]/text.txt /ms/home/o/n/[onyen]/
The above “bsub” command, issued with the "-R ms" parameters, will submit your copy or move job to a host with very good connectivity to the mass storage system rather than using the slender resources of the login node you happen to be on.


