Getting Started on KillDevil

Table of Contents

Introduction

System Information

Getting an Account

Logging In

Cluster Usage Charges

Directory Spaces

Mass Storage

Development and Application environment

Applications available

Software Development Tools

Compiling

Job Submission

Transferring Files

Using Tar to Archive

FAQs

Additional Help

Introduction

  • The KillDevil cluster is a Linux-based computing system available to researchers across the campus. With more than 9500 computing cores across 774 servers and a large scratch disk space, it provides an environment that can accommodate many types of computational problems. The compute nodes are interconnected with a high speed Infiniband network, making this especially appropriate for large parallel jobs. Killdevil is a heterogeneous cluster with at least 48 GB of memory per node. In addition, there are nodes with extended memory, extremely large memory, and GPGPU computing (Note: “”KillDevil” is named after the North Carolina coastal town.)

System Information

  • Login node:
    • KillDevil.unc.edu 12 cores @ 2.67 GHz Intel with 12M L2 cache (Model X5650), 48 GB memory
  • Compute nodes:
    • 119 Dell C6100 servers or 476 compute nodes, each with 12-core, 2.93 GHz Intel processors, 12M L3 cache (Model X5670), and 48 GB memory for a total of 5712 processing cores at 2:1 ratio IB interconnect.
    • 17 Dell C6100 servers or 68 compute nodes, each with 12-core, 2.93 GHz Intel processors, 12M L3 cache (Model X5670), and 96 GB memory for a total of 816 processing cores at 2:1 ratio IB interconnect.
    • 17 Dell C6220 servers or 68 compute nodes, each with 16-core, 2.6.0 Ghz Intel processors, 20M L3 cache (Model E5-2670), and 64GB memory for a total of 1088 processing cores at 2:1 ratio IB interconnect
    • 32 Dell C6100 servers or 128 compute nodes, each with 12-core, 2.93 GHz Intel processors, 12M L3 cache (Model X5670), and 48 GB memory for a total of 1536 processing cores at FBB (full blocking factor) or 1:1 ratio IB interconnect.
  • Large Memory Compute nodes:
    • 2 Dell R910 servers or 2 compute nodes, each with 32-core, 2.00 Ghz Intel processors, 18M L3 cache (Model X7550) with 1 TB memory for a total of 64 processing cores.
  • GPU Compute nodes:
    • 8 Dell C6100 servers or 32 compute nodes, each with 12-core, 2.67 GHz Intel processors, 12M L2 cache (Model X5650), and 48 GB memory for a total of 384 processing cores.
    • 4 Dell C410X servers, each with 16 Nvidia M2070 GPUs for a total of 64 GPU units.
  • Operating System:
    • RHEL 5.6 (Tikanga)
  • Shared Filesystems:
    • 125 TB “/lustre/scr” Lustre File System intended for large files (>1MB)
    • 85 TB “/nas02” NetApp NFS for home directories, depts space, and apps
    • 42 TB “/netscr” storage intended for smaller files (<1MB)
  • Interconnect:
    • Infiniband 4x QDR (see compute nodes above)
  • Resource management:
    • Handled by LSF, through which all jobs are submitted for processing

Getting an Account

You can visit the Onyen Services page, then click on the Subscribe to Services button and select KillDevil Cluster.

If you attempt to subscribe for a KillDevil account through Onyen Services you may see the following message

You are ineligible for the following services for the reasons cited:
 KillDevil Cluster: missing '(LIVE|EXCH)' prerequsite.
 Kure Cluster: missing '(LIVE|EXCH)' prerequsite.

In this case, you should send an email to research@unc.edu requesting an account on KillDevil. In the email, please do indicate that you were not able to subscribe for a KillDevil account through Onyen Services and include the following information in your request:

  • Onyen
  • Your “@email.unc.edu” email address
  • Full name
  • Campus address
  • Campus phone number (if any) and number where you can be reached while running jobs
  • Department you are affiliated with (the one relevant to the work you will do on KillDevil)
  • Faculty sponsor’s (PI) name (and onyen if known) if you are not a faculty member
  • A description of the work you expect to do on KillDevil

You will receive an email notification once your account has been created. When requesting a Killdevil account, do not request to be added to a group. If you know you need to be added to a group, request that in an email to research@unc.edu and not in the Killdevil account from.

Note that the phone number you provide needs to be one where we can reliably and routinely reach you, and the email address onyen@email.unc.edu must work. If you are running jobs on the cluster, you must check your UNC email of record (Onyen@email.unc.edu) periodically while your jobs are running. If your job creates problems for other users, we will attempt to contact you via phone and email. Depending on the situation, if we are not able to reach you immediately, it is likely that we will kill your jobs. Remember that these are shared resources, and we do our best to preserve a working environment for everyone. If your jobs repeatedly affect others and we are unable to reach you, your account on the cluster may be suspended until you contact us to reach resolution. The best way to ensure ongoing accurate contact information is to keep your email and phone number up-to-date in the UNC directory http://dir.unc.edu.

Logging In

Once you have an account on KillDevil, use ssh to connect to killdevil.unc.edu and log-in with your Onyen:

ssh killdevil.unc.edu

In order to connect to KillDevil from an off-campus location, a connection to the campus network through a VPN client is required. Your very first login will begin normally with your onyen and password. Then your session will run ssh-keygen for you. Accept the defaults by pressing at each prompt. No password is necessary for this key generation. If this environment is not established correctly, jobs will fail with “permission denied” messages. (See the KillDevil FAQ for details.)

Even though the KillDevil cluster has many compute nodes, you never actually log-in to any of them. Instead, you log-in as above to the cluster. A successful login takes you to “login node” resources that have been set aside for user access. The login node is where you will edit and compile your code, and then you will use the LSF job scheduler to submit your code to the compute nodes for processing. The login node can also be used for basic file operations such as copying, moving, and zipping files. If you need to zip large files on the login node, then please only do one at a time. Interactive use on the login node must be restricted to compiling and debugging. Other processes running on the login node are subject to immediate termination by the system administrators.

Cluster Usage Charges

All KillDevil cluster users are members of a PI group, as indicated by users when requesting accounts. Please note that PI groups exceeding 200,000 CPU hours per fiscal year are subject to a nominal charge for additional hours. Details about KillDevil cluster charges are available at the ITS Research computing web site.

Directory Spaces

NAS home space

Your home directory will be in /nas02/home/o/n/(the “o/n/” are the first two letters of your onyen, of course.). Your home directory has a 10 GB soft limit and a 15 GB hard limit.

Work/Scratch Space

  • Lustre scratch space:

Your /lustre directory will be in /lustre/scr/o/n/(the “o/n/” are the first two letters of your onyen of course).

The Lustre-based scratch space, /lustre, should be used as your primary scratch working directory now. The scratch file system /lustre has a capacity of 125TB which is intended and optimized for research data files larger than 1 Megabyte and/or parallel workload applications. Smaller and more serial workloads will likely be better candidates for the /netscr scratch space file system.

Please see the help documents on Lustre filesystem basics and striping best practices for more information on optimally using Lustre.

  • Netscratch scratch space:

For Netscratch scratch space, /nestcr, job performance is best in instances where users have many small data files less than 64K in size. Scratch space is an NFS-mounted file system and thus shared by all KillDevil users as well as the users of other Research Computing systems.

The following apply to all scratch spaces:

  • Scratch space uses standard UNIX permissions to control access to files and directories. By default other users in your group (graduate students, faculty, employees) have read access to your Netscratch directory. You can easily remove this read permission with the “chmod” command.
  • A policy has been established for cleaning out files. Scratch file deletion will be enforced with files older than 21 days being removed. Any file not used or modified in the last 21 days will be deleted.
  • Scratch space is a shared, temporary work space. Please not that scratch space is not backed up and is, therefore, not intended for permanent data storage. See the “Mass Storage” section below about how to store permanent data.
  • Note it is a violation of research computing policy to use artificial means, such as the “touch” command, to maintain unused files in the scratch directory beyond their natural lifetime. Violators will be warned and repeat violators are subject to loss of privileges and access. This is a shared resource, please be courteous to other users.

What follows are suggested “best practices” to keep in mind when using scratch space on the KillDevil cluster:

  • Try to avoid using “ls –l” and use “ls” with no options instead.
  • Never have a large number of files (>1000) in a single directory.
  • Avoid small files (i.e. less than 1MB) on the /lustre file system, use /netscr instead for such files.
  • Avoid submitting jobs in a way that will access the same file(s) at the same point(s) in time.
  • Limit the number of processes performing parallel I/O work, SAS work, or other highly intensive I/O jobs.

Mass Storage

The Mass Storage system (also known as StorNext or /ms) is intended for archiving files and storing very large files. Files located in mass storage are not accessible to jobs running in LSF. Mass storage is not to be used as a work directory or as a backup location for local disk drives, operating systems, or software. In general, files that change often or directories with more than a thousand files in them will cause performance problems and consume tape resources. The PC backup software provided by UNC might be an alternative solution rather than having to copy your PC files to mass storage.

Mass Storage is similar to an ordinary disk file system in that it keeps an inode (for recording data location, etc.) and data blocks for each file. Files can be moved in and out of mass storage by using simple UNIX commands such as “cp” and “mv” or by using sftp/scp. As the Mass Storage system is optimized for archiving data, your programs should not directly read or write from the Mass Storage system. Instead copy your data from “~/ms” to scratch space such as “/netscr/”.

If you are routinely storing large numbers of small files (more than several hundred files at a time) in Mass Storage, you should “tar” or “zip” those smaller files into one tarball or zip file outside of mass storage and then move that tarball or zip file to mass storage. You are not required to compress the tarball or zip file since the mass storage tape drive hardware will compress your data. Reducing the number of individual small files will help the overall performance of the StorNext Mass Storage system. See the more detailed list of things to avoid.

To access Mass Storage from KillDevil, type:

cd ~/ms

Any files in the scratch space that you wish to save, can be moved to the mass storage preferably in tar or zip format.

IMPORTANT NOTE:

If you are currently doing any large moves or copies of data (to or from mass storage) you should use LSF “ms” queue:

bsub -q ms cp /netscr/onyen/text.txt /ms/home/o/n/onyen

This bsub command, issued with the “-q ms” parameter, will submit your copy or move job to a host with very good connectivity to the mass storage system. We expect these hosts to handle multiple data moves well, therefore removing this burden from the login node.

Development and Application environment

The environment on KillDevil is presented as modules. The basic module commands are

module [ add | avail | help | list | load | initadd | unload | initrm | show ]

When you first log in you should run

module list

And the response should be

1) null

To add a module for this session only, use “module add [application]” where “[application]” is the name given on the output of the “module avail” command.

To add a module for every time you login, use “module initadd [application]”. This does not change your current session, only later logins.

Please refer to the Help document on modules for further information.

Applications available

Applications used by many groups across campus have been compiled and made available on KillDevil. To see the full list of applications currently available run

module avail

To see the matrix of applications and clusters visit the Application Matrix.

Software Development Tools

  • Intel Compiler Suite
    • v. 11 for Fortran77, Fortran90, C and C++, Math Kernel Library
  • Portland Group (PGI) Compiler Suite
    • v.10.3 for Fortran77, Fortran90, C and C++
  • GNU (GCC) Compiler Suite
    • V.4.1.2 for Fortran77, Fortran90, C and C++
  • MPI for parallel programming via OFED v1.5.2
    • MVAPICH
    • OPENMPI
  • Totalview Debugger
  • Job scheduler
    • LSF v8.0

Note that one and only one of these MPI environments can be loaded at a given time. If you load the mvapich or openmpi environment for a compiler, that will also load its standard environment. For example, you don’t do “module load pgi” and then “module load mvapich_pgi”. The latter command loads all the PGI compiler commands including the mpi ones. If you do not want the MPI compiler commands you use just the “module load pgi”.

Compiling

The available MPI compiler commands are:

mpicc
mpif77
mpif90
mpiCC
mpicxx

Once you have the default compile module added to your environment with the command

module initadd mvapich_[intel|pgi|gcc]   (specify which compiler to use)

or

module initadd openmpi_[intel|pgi|gcc)

then both your compiles and your job submissions will have available all the appropriate environment including man pages, paths, libraries, include files and any required environment variables.

Job Submission

Once you have decided what software you need to use, added those packages to your environment using modules, and you have successfully compiled your serial or parallel code, you can then submit your jobs to run on KillDevil. We use LSF (Load Sharing Facility) software to schedule and manage jobs that are submitted to run on KillDevil.

To submit a job to run, you will need to use the LSF bsub command as shown below. LSF submits jobs to particular job queues you specify. So in your “bsub” command, you will need to specify the queue in which the job is to run by using the “-q” bsub option. If you don’t provide a queue explicitly, the job will be given to the “week” queue by default. If you do not add a “-n” to your bsub statement, it is assumed the job should run on 1 CPU. There is a default limit of a total of 1024 CPU/User across the cluster. If you are already using 1024 job slots and you submit a job to run, that job will PEND until job slots are freed as your running jobs finish. Similarly, if all the job slots in the cluster are in use when you submit a job, even if you are not using any job slots yourself, your job will PEND.

A short description of the queues available to users in the KillDevil cluster can be found below. You can also use the “bqueues” command to list the properties of a specific queue. For example, you could type "bqueues -l debug" (that’s a lower case “-l” for “long listing”) to find out more about the debug queue. Additional queues may be added as need dictates. All queues share a common fairshare allocation policy that governs which PENDing job will be dispatched next based on the recent runtime history of each user or group with jobs PENDing to run. The following queues are available on KillDevil:

 

Queue name Job Duration CPU Range/Job Total # CPUs/User across all jobs in queue
day 24 hrs. 512 max. 1024
debug 30 min. 64 max. 64
hour 60 min. 512 max. 1024
week (default queue) 7 days 512 max. 512
bigmem 7 days 32 max. 32
staff -- -- 1024
Idle -- -- 1024All others preempt
adhoc -- -- Configured as requiredClosed by default
gpu 7 days 512 max. 512
ms 7 days 1 4

A list of resources defined for a given node can be seen in the last column of output of the following command:

lshosts |  more

Submitting Batch Jobs:

The general form of syntax for submitting a serial batch job is:

bsub [- bsub options] executable [- executable options]

To learn more about bsub and bsub options:

man bsub
Note. LSF itself has an overhead of 30-40 seconds per job. So, if you have thousands of short jobs to run please structure your work in such a way that those jobs are batched together by scripts and submitted as fewer jobs that will each take 20-30 minutes to run (so as to justify the overhead of scheduling the jobs). If you have a few dozen jobs under a minute then the system absorbs the overhead. However, when an exorbitant number of “very short” jobs (i.e. jobs that run for less than a few minutes) are pushed through the system the performance of the job scheduler degrades.

Important. If you are planning to submit a job that requires more than 4 GB of memory, be sure to read this.

Submitting Interactive Jobs:

Submitting interactive jobs requires the bsub option “-Ip” as in the “bsub” command shown below:

bsub -Ip my_interactive_job

For example, the command below will give you an interactive bash shell session:

bsub -Ip /bin/bash
Note. Interactive jobs will not run on KillDevil login nodes, but run on a compute node instead.

Submitting Parallel Jobs:

Parallel jobs that use shared memory (i.e. OpenMP jobs):

For this type of parallel job, it is necessary to submit the job in such a way that all of the requested job slots land on one (i.e. the same) compute node. In addition, before you submit your job you will need to set an environment variable, OMP_NUM_THREADS, equal to the number of requested job slots- for Killdevil, this number must be less than or equal to 12 since the compute nodes on Killdevil (in general) have 12 cores.

The general procedure for submitting an OpenMP batch job, which in this example will be referred to as “mycode,” that uses (for instance) 10 threads is first set the OMP_NUM_THREADS environment variable to 10.

For Bourne, bash, and related shells:

export OMP_NUM_THREADS=10

For csh and related shells:

setenv OMP_NUM_THREADS 10

Then to submit the job:

bsub -n 10 -R "span[hosts=1]" ./mycode

In the above job submission command the –n 10 asks for ten job slots (which intentionally matches the value set for OMP_NUM_THREADS) and the –R “span[hosts=1]” requests that all ten job slots be placed on the same host.

Another way to submit the job is to create a file (for example, call it mycode.bsub) with the following lines in it

####   run_mycode    ####
#BSUB -n 10
#BSUB –R “span[hosts=1]”
./mycode
##### end of run_mycode ####

and then do the following command at the KillDevil prompt:

bsub < mycode.bsub

Parallel jobs that use distributed memory (i.e. MPI jobs):

The general form of syntax for submitting a MPI batch job, in this example “mycode,” is:

# be sure you have the appropriate module added for use at login
module initls
# response: one of the MPI suites such as mvapich_intel, openmpi_intel, etc.
bsub -n "< number CPUs >" -o out.%J -e err.%J  \
      mpirun ./mycode

Another way to submit a parallel job is to create a file (for example, call it mycode.bsub) with the following lines in it

####   run_mycode    ####
#BSUB -n "< number CPUs >"
#BSUB -e err.%J
#BSUB -o out.%J
mpirun ./mycode
##### end of run_mycode ####

and then do the following command at the KillDevil prompt:

bsub < mycode.bsub

For more basic LSF commands refer to the Help document on LSF (Load Sharing Facility).

Recommendations for Large-scale job submissions:

When submitting multiple hundreds of jobs concurrently on Killdevil, each individual job should run longer than 10 minutes otherwise it puts an additional strain on the LSF system and slows down job turnaround time. The reason for this is that, given the size of Killdevil, there is considerable overhead cost for LSF to schedule a single job (on the order of minutes). This overhead causes not only additional turnaround time for your own jobs but can also, if the cluster is busy, create a scheduling burden for the whole cluster.

An additional suggestion is that instead of running hundreds of “short” jobs users should obtain a cpu using the “day” or “week” queue and hold on to that cpu for as long as they can running multiple tasks. When the cluster is busy, the challenge is in getting the CPU in the first place; it benefits yourself to get as much out of the CPU once you have it.

As a simple example, suppose you have 16800 jobs which each run for less than 3 minutes. With regard to choosing the queue for your job submission, you have three options:

1. Submit 16800 3-minute jobs to the hour queue.

2. Batch the 16800 jobs into batches of size 20, which will result in 840 batches (each containing 20 jobs) and then submit each of the 840 batch jobs to the hour queue.

3. Batch the 16800 jobs into batches of size 480, which will result in 35 batches (each containing 480 jobs) and then submit each of the 35 batch jobs to the day queue.

4. Batch the 16800 jobs into batches of size 3360, which will result in 5 batches (each containing 3360 jobs) and then submit each of the 5 batch jobs to the week queue.

Which of options 2 through 4 you choose will depend on factors such as how quickly you need your results and how busy the cluster is. Our point is we strongly recommend any of options #2,#3, or #4 over option #1. We are glad to help users with their job submissions; please send questions to research@unc.edu.

Monitoring and Controlling Jobs:

You can check the status of your submitted LSF jobs with the command “bjobs”. The output of that command will include a Job ID, the status of your job (typically “PEND” or “RUN”), the queue to which you submitted the job, the job name, and other information. Additional details can be obtained with:

bjobs -l [JobID]

If you need to kill/end a running job, use the “bkill” command:

bkill [JobID]

Where JobID is the LSF job ID displayed with the “bjobs” command.

Finally, if you don’t provide an output file to LSF (“-o filename” in the bsub command) LSF will send email to your email address when the job finishes, whether it completes successfully or not (unless you are running your job interactively of course). For any jobs that produce large amounts of output you should use the “-o filename” bsub option.

Note
Jobs running outside the LSF queues will be killed. The logon privileges of users who repeatedly run jobs outside of the LSF queues will be suspended.

Transferring Files

It is likely that you will need to transfer files between your campus computer systems that have AFS and KillDevil. You will need to use the “sftp” or “scp” command to move your files since KillDevil does not have any /afs/isis file systems available. The command “sftp” works similarly to the popular “ftp” command but is more secure. From your host UNIX/Linux or Mac computer terminal window, type:

sftp onyen@killdevil.unc.edu

Enter your Onyen password and you will be presented with the sftp prompt. Use the “put” and “get” commands to transfer files, as you would do with standard ftp.

To use the “scp” command follow this example to get a file, named “temp.txt”, from KillDevil and store it in your local computer’s “/tmp” directory with the same file name. You will be prompted for your password.

scp onyen@killdevil.unc.edu:/netscr/onyen/temp.txt /tmp/

You can also copy a whole directory. The following command will recursively copy the whole directory “/tmp/temp_dir/” from your local computer to KillDevil, and place it in the “/netscr/” directory with the name “temp_dir”:

scp -r /tmp/temp_dir/ onyen@killdevil.unc.edu:/netscr/onyen

Using Tar to Archive

Determine the tar archive size:

du -hs mydirectory/

Pass tar through split to create multiple tar files of a max 100 GB size, ideally 5 Gb to 50 Gb range depending on total data size:

tar -cvf -  | split --bytes=100g - myarchive.tar.

Verify things:

cat myarchive.tar.* | tar tf –

If you need to restore files back to the original state once more, concatenate files and pass through tar to expand the files:

cat myarchive.tar.* | tar xvf -

If a large tar file already exist, it may be split via the split command alone:

split --bytes=10g  myarchive.tar  myarchive.tar.

FAQs

Please read KillDevil FAQ regarding questions useful for users new to this service.

Additional Help

Computing with the GPU nodes on Killdevil

Be sure to check the Research Computing home page for information about other resources available to you.

We encourage you to attend a training session on Getting Started on KillDevil and other related topics. Please refer to the Research Computing Training site for further information.

If you have any questions, please feel free either to call 962-HELP, email research@unc.edu, or submit an Online Web Ticket.