Table of Contents
- The KillDevil cluster is a Linux-based computing system available to researchers across the campus. With more than 9500 computing cores across 774 servers and a large scratch disk space, it provides an environment that can accommodate many types of computational problems. The compute nodes are interconnected with a high speed Infiniband network, making this especially appropriate for large parallel jobs. Killdevil is a heterogeneous cluster with at least 48 GB of memory per node. In addition, there are nodes with extended memory, extremely large memory, and GPGPU computing (Note: “”KillDevil” is named after the North Carolina coastal town.)
- Login node:
- KillDevil.unc.edu 12 cores @ 2.67 GHz Intel with 12M L2 cache (Model X5650), 48 GB memory
- Compute nodes:
- 119 Dell C6100 servers or 476 compute nodes, each with 12-core, 2.93 GHz Intel processors, 12M L3 cache (Model X5670), and 48 GB memory for a total of 5712 processing cores at 2:1 ratio IB interconnect.
- 17 Dell C6100 servers or 68 compute nodes, each with 12-core, 2.93 GHz Intel processors, 12M L3 cache (Model X5670), and 96 GB memory for a total of 816 processing cores at 2:1 ratio IB interconnect.
- 17 Dell C6220 servers or 68 compute nodes, each with 16-core, 2.6.0 Ghz Intel processors, 20M L3 cache (Model E5-2670), and 64GB memory for a total of 1088 processing cores at 2:1 ratio IB interconnect
- 32 Dell C6100 servers or 128 compute nodes, each with 12-core, 2.93 GHz Intel processors, 12M L3 cache (Model X5670), and 48 GB memory for a total of 1536 processing cores at FBB (full blocking factor) or 1:1 ratio IB interconnect.
- Large Memory Compute nodes:
- 2 Dell R910 servers or 2 compute nodes, each with 32-core, 2.00 Ghz Intel processors, 18M L3 cache (Model X7550) with 1 TB memory for a total of 64 processing cores.
- GPU Compute nodes:
- 8 Dell C6100 servers or 32 compute nodes, each with 12-core, 2.67 GHz Intel processors, 12M L2 cache (Model X5650), and 48 GB memory for a total of 384 processing cores.
- 4 Dell C410X servers, each with 16 Nvidia M2070 GPUs for a total of 64 GPU units.
- Operating System:
- RHEL 5.6 (Tikanga)
- Shared Filesystems:
- 125 TB “/lustre/scr” Lustre File System intended for large files (>1MB)
- 85 TB “/nas02” NetApp NFS for home directories, depts space, and apps
- 42 TB “/netscr” storage intended for smaller files (<1MB)
- Infiniband 4x QDR (see compute nodes above)
- Resource management:
- Handled by LSF, through which all jobs are submitted for processing
You can visit the Onyen Services page, then click on the Subscribe to Services button and select KillDevil Cluster.
If you attempt to subscribe for a KillDevil account through Onyen Services you may see the following message
You are ineligible for the following services for the reasons cited:
KillDevil Cluster: missing '(LIVE|EXCH)' prerequsite.
Kure Cluster: missing '(LIVE|EXCH)' prerequsite.
In this case, you should send an email to email@example.com requesting an account on KillDevil. In the email, please do indicate that you were not able to subscribe for a KillDevil account through Onyen Services and include the following information in your request:
- Your “@email.unc.edu” email address
- Full name
- Campus address
- Campus phone number (if any) and number where you can be reached while running jobs
- Department you are affiliated with (the one relevant to the work you will do on KillDevil)
- Faculty sponsor’s (PI) name (and onyen if known) if you are not a faculty member
- A description of the work you expect to do on KillDevil
You will receive an email notification once your account has been created. When requesting a Killdevil account, do not request to be added to a group. If you know you need to be added to a group, request that in an email to firstname.lastname@example.org and not in the Killdevil account from.
Note that the phone number you provide needs to be one where we can reliably and routinely reach you, and the email address email@example.com must work. If you are running jobs on the cluster, you must check your UNC email of record (Onyen@email.unc.edu) periodically while your jobs are running. If your job creates problems for other users, we will attempt to contact you via phone and email. Depending on the situation, if we are not able to reach you immediately, it is likely that we will kill your jobs. Remember that these are shared resources, and we do our best to preserve a working environment for everyone. If your jobs repeatedly affect others and we are unable to reach you, your account on the cluster may be suspended until you contact us to reach resolution. The best way to ensure ongoing accurate contact information is to keep your email and phone number up-to-date in the UNC directory http://dir.unc.edu.
Once you have an account on KillDevil, use ssh to connect to killdevil.unc.edu and log-in with your Onyen:
In order to connect to KillDevil from an off-campus location, a connection to the campus network through a VPN client is required. Your very first login will begin normally with your onyen and password. Then your session will run ssh-keygen for you. Accept the defaults by pressing at each prompt. No password is necessary for this key generation. If this environment is not established correctly, jobs will fail with “permission denied” messages. (See the KillDevil FAQ for details.)
Even though the KillDevil cluster has many compute nodes, you never actually log-in to any of them. Instead, you log-in as above to the cluster. A successful login takes you to “login node” resources that have been set aside for user access. The login node is where you will edit and compile your code, and then you will use the LSF job scheduler to submit your code to the compute nodes for processing. The login node can also be used for basic file operations such as copying, moving, and zipping files. If you need to zip large files on the login node, then please only do one at a time. Interactive use on the login node must be restricted to compiling and debugging. Other processes running on the login node are subject to immediate termination by the system administrators.
NAS home space
Your home directory will be in /nas02/home/o/n/(the “o/n/” are the first two letters of your onyen, of course.). Your home directory has a 10 GB soft limit and a 15 GB hard limit.
- Lustre scratch space:
Your /lustre directory will be in /lustre/scr/o/n/(the “o/n/” are the first two letters of your onyen of course).
The Lustre-based scratch space, /lustre, should be used as your primary scratch working directory now. The scratch file system /lustre has a capacity of 125TB which is intended and optimized for research data files larger than 1 Megabyte and/or parallel workload applications. Smaller and more serial workloads will likely be better candidates for the /netscr scratch space file system.
- Netscratch scratch space:
For Netscratch scratch space, /nestcr, job performance is best in instances where users have many small data files less than 64K in size. Scratch space is an NFS-mounted file system and thus shared by all KillDevil users as well as the users of other Research Computing systems.
The following apply to all scratch spaces:
- Scratch space uses standard UNIX permissions to control access to files and directories. By default other users in your group (graduate students, faculty, employees) have read access to your Netscratch directory. You can easily remove this read permission with the “chmod” command.
- A policy has been established for cleaning out files. Scratch file deletion will be enforced with files older than 21 days being removed. Any file not used or modified in the last 21 days will be deleted.
- Scratch space is a shared, temporary work space. Please not that scratch space is not backed up and is, therefore, not intended for permanent data storage. See the “Mass Storage” section below about how to store permanent data.
- Note it is a violation of research computing policy to use artificial means, such as the “touch” command, to maintain unused files in the scratch directory beyond their natural lifetime. Violators will be warned and repeat violators are subject to loss of privileges and access. This is a shared resource, please be courteous to other users.
- Try to avoid using “ls –l” and use “ls” with no options instead.
- Never have a large number of files (>1000) in a single directory.
- Avoid small files (i.e. less than 1MB) on the /lustre file system, use /netscr instead for such files.
- Avoid submitting jobs in a way that will access the same file(s) at the same point(s) in time.
- Limit the number of processes performing parallel I/O work, SAS work, or other highly intensive I/O jobs.
The Mass Storage system (also known as StorNext or /ms) is intended for archiving files and storing very large files. Files located in mass storage are not accessible to jobs running in LSF. Mass storage is not to be used as a work directory or as a backup location for local disk drives, operating systems, or software. In general, files that change often or directories with more than a thousand files in them will cause performance problems and consume tape resources. The PC backup software provided by UNC might be an alternative solution rather than having to copy your PC files to mass storage.
Mass Storage is similar to an ordinary disk file system in that it keeps an inode (for recording data location, etc.) and data blocks for each file. Files can be moved in and out of mass storage by using simple UNIX commands such as “cp” and “mv” or by using sftp/scp. As the Mass Storage system is optimized for archiving data, your programs should not directly read or write from the Mass Storage system. Instead copy your data from “~/ms” to scratch space such as “/netscr/”.
If you are routinely storing large numbers of small files (more than several hundred files at a time) in Mass Storage, you should “tar” or “zip” those smaller files into one tarball or zip file outside of mass storage and then move that tarball or zip file to mass storage. You are not required to compress the tarball or zip file since the mass storage tape drive hardware will compress your data. Reducing the number of individual small files will help the overall performance of the StorNext Mass Storage system. See the more detailed list of things to avoid.
To access Mass Storage from KillDevil, type:
Any files in the scratch space that you wish to save, can be moved to the mass storage preferably in tar or zip format.
If you are currently doing any large moves or copies of data (to or from mass storage) you should use LSF “ms” queue:
bsub -q ms cp /netscr/onyen/text.txt /ms/home/o/n/onyen
This bsub command, issued with the “-q ms” parameter, will submit your copy or move job to a host with very good connectivity to the mass storage system. We expect these hosts to handle multiple data moves well, therefore removing this burden from the login node.
The environment on KillDevil is presented as modules. The basic module commands are
module [ add | avail | help | list | load | initadd | unload | initrm | show ]
When you first log in you should run
And the response should be
To add a module for this session only, use “module add [application]” where “[application]” is the name given on the output of the “module avail” command.
To add a module for every time you login, use “module initadd [application]”. This does not change your current session, only later logins.
Please refer to the Help document on modules for further information.
Applications used by many groups across campus have been compiled and made available on KillDevil. To see the full list of applications currently available run
To see the matrix of applications and clusters visit the Application Matrix.
- Intel Compiler Suite
- v. 11 for Fortran77, Fortran90, C and C++, Math Kernel Library
- Portland Group (PGI) Compiler Suite
- v.10.3 for Fortran77, Fortran90, C and C++
- GNU (GCC) Compiler Suite
- V.4.1.2 for Fortran77, Fortran90, C and C++
- MPI for parallel programming via OFED v1.5.2
- Totalview Debugger
- Job scheduler
- LSF v8.0
Note that one and only one of these MPI environments can be loaded at a given time. If you load the mvapich or openmpi environment for a compiler, that will also load its standard environment. For example, you don’t do “module load pgi” and then “module load mvapich_pgi”. The latter command loads all the PGI compiler commands including the mpi ones. If you do not want the MPI compiler commands you use just the “module load pgi”.
The available MPI compiler commands are:
Once you have the default compile module added to your environment with the command
module initadd mvapich_[intel|pgi|gcc] (specify which compiler to use)
module initadd openmpi_[intel|pgi|gcc)
then both your compiles and your job submissions will have available all the appropriate environment including man pages, paths, libraries, include files and any required environment variables.
Once you have decided what software you need to use, added those packages to your environment using modules, and you have successfully compiled your serial or parallel code, you can then submit your jobs to run on KillDevil. We use LSF (Load Sharing Facility) software to schedule and manage jobs that are submitted to run on KillDevil.
To submit a job to run, you will need to use the LSF “bsub” command as shown below. LSF submits jobs to particular job queues you specify. So in your “bsub” command, you will need to specify the queue in which the job is to run by using the “-q” bsub option. If you don’t provide a queue explicitly, the job will be given to the “week” queue by default. If you do not add a “-n” to your bsub statement, it is assumed the job should run on 1 CPU. There is a default limit of a total of 1024 CPU/User across the cluster. If you are already using 1024 job slots and you submit a job to run, that job will PEND until job slots are freed as your running jobs finish. Similarly, if all the job slots in the cluster are in use when you submit a job, even if you are not using any job slots yourself, your job will PEND.
A short description of the queues available to users in the KillDevil cluster can be found below. You can also use the “bqueues” command to list the properties of a specific queue. For example, you could type "bqueues -l debug" (that’s a lower case “-l” for “long listing”) to find out more about the debug queue. Additional queues may be added as need dictates. All queues share a common fairshare allocation policy that governs which PENDing job will be dispatched next based on the recent runtime history of each user or group with jobs PENDing to run. The following queues are available on KillDevil:
|Queue name||Job Duration||CPU Range/Job||Total # CPUs/User across all jobs in queue|
|day||24 hrs.||512 max.||1024|
|debug||30 min.||64 max.||64|
|hour||60 min.||512 max.||1024|
|week (default queue)||7 days||512 max.||512|
|bigmem||7 days||32 max.||32|
|Idle||--||--||1024All others preempt|
|adhoc||--||--||Configured as requiredClosed by default|
|gpu||7 days||512 max.||512|
A list of resources defined for a given node can be seen in the last column of output of the following command:
lshosts | more
Submitting Batch Jobs:
The general form of syntax for submitting a serial batch job is:
bsub [- bsub options] executable [- executable options]
To learn more about bsub and bsub options:
Important. If you are planning to submit a job that requires more than 4 GB of memory, be sure to read this.
Submitting Interactive Jobs:
Submitting interactive jobs requires the bsub option “-Ip” as in the “bsub” command shown below:
bsub -Ip my_interactive_job
For example, the command below will give you an interactive bash shell session:
bsub -Ip /bin/bash
Submitting Parallel Jobs:
Parallel jobs that use shared memory (i.e. OpenMP jobs):
For this type of parallel job, it is necessary to submit the job in such a way that all of the requested job slots land on one (i.e. the same) compute node. In addition, before you submit your job you will need to set an environment variable, OMP_NUM_THREADS, equal to the number of requested job slots- for Killdevil, this number must be less than or equal to 12 since the compute nodes on Killdevil (in general) have 12 cores.
The general procedure for submitting an OpenMP batch job, which in this example will be referred to as “mycode,” that uses (for instance) 10 threads is first set the OMP_NUM_THREADS environment variable to 10.
For Bourne, bash, and related shells:
For csh and related shells:
setenv OMP_NUM_THREADS 10
Then to submit the job:
bsub -n 10 -R "span[hosts=1]" ./mycode
In the above job submission command the –n 10 asks for ten job slots (which intentionally matches the value set for OMP_NUM_THREADS) and the –R “span[hosts=1]” requests that all ten job slots be placed on the same host.
Another way to submit the job is to create a file (for example, call it mycode.bsub) with the following lines in it
#### run_mycode ####
#BSUB -n 10
#BSUB –R “span[hosts=1]”
##### end of run_mycode ####
and then do the following command at the KillDevil prompt:
bsub < mycode.bsub
Parallel jobs that use distributed memory (i.e. MPI jobs):
The general form of syntax for submitting a MPI batch job, in this example “mycode,” is:
# be sure you have the appropriate module added for use at login
# response: one of the MPI suites such as mvapich_intel, openmpi_intel, etc.
bsub -n "< number CPUs >" -o out.%J -e err.%J \
Another way to submit a parallel job is to create a file (for example, call it mycode.bsub) with the following lines in it
#### run_mycode ####
#BSUB -n "< number CPUs >"
#BSUB -e err.%J
#BSUB -o out.%J
##### end of run_mycode ####
and then do the following command at the KillDevil prompt:
bsub < mycode.bsub
For more basic LSF commands refer to the Help document on LSF (Load Sharing Facility).
Recommendations for Large-scale job submissions:
When submitting multiple hundreds of jobs concurrently on Killdevil, each individual job should run longer than 10 minutes otherwise it puts an additional strain on the LSF system and slows down job turnaround time. The reason for this is that, given the size of Killdevil, there is considerable overhead cost for LSF to schedule a single job (on the order of minutes). This overhead causes not only additional turnaround time for your own jobs but can also, if the cluster is busy, create a scheduling burden for the whole cluster.
An additional suggestion is that instead of running hundreds of “short” jobs users should obtain a cpu using the “day” or “week” queue and hold on to that cpu for as long as they can running multiple tasks. When the cluster is busy, the challenge is in getting the CPU in the first place; it benefits yourself to get as much out of the CPU once you have it.
As a simple example, suppose you have 16800 jobs which each run for less than 3 minutes. With regard to choosing the queue for your job submission, you have three options:
1. Submit 16800 3-minute jobs to the hour queue.
2. Batch the 16800 jobs into batches of size 20, which will result in 840 batches (each containing 20 jobs) and then submit each of the 840 batch jobs to the hour queue.
3. Batch the 16800 jobs into batches of size 480, which will result in 35 batches (each containing 480 jobs) and then submit each of the 35 batch jobs to the day queue.
4. Batch the 16800 jobs into batches of size 3360, which will result in 5 batches (each containing 3360 jobs) and then submit each of the 5 batch jobs to the week queue.
Which of options 2 through 4 you choose will depend on factors such as how quickly you need your results and how busy the cluster is. Our point is we strongly recommend any of options #2,#3, or #4 over option #1. We are glad to help users with their job submissions; please send questions to firstname.lastname@example.org.
Monitoring and Controlling Jobs:
You can check the status of your submitted LSF jobs with the command “bjobs”. The output of that command will include a Job ID, the status of your job (typically “PEND” or “RUN”), the queue to which you submitted the job, the job name, and other information. Additional details can be obtained with:
bjobs -l [JobID]
If you need to kill/end a running job, use the “bkill” command:
Where JobID is the LSF job ID displayed with the “bjobs” command.
Finally, if you don’t provide an output file to LSF (“-o filename” in the bsub command) LSF will send email to your email address when the job finishes, whether it completes successfully or not (unless you are running your job interactively of course). For any jobs that produce large amounts of output you should use the “-o filename” bsub option.
Jobs running outside the LSF queues will be killed. The logon privileges of users who repeatedly run jobs outside of the LSF queues will be suspended.
It is likely that you will need to transfer files between your campus computer systems that have AFS and KillDevil. You will need to use the “sftp” or “scp” command to move your files since KillDevil does not have any /afs/isis file systems available. The command “sftp” works similarly to the popular “ftp” command but is more secure. From your host UNIX/Linux or Mac computer terminal window, type:
Enter your Onyen password and you will be presented with the sftp prompt. Use the “put” and “get” commands to transfer files, as you would do with standard ftp.
To use the “scp” command follow this example to get a file, named “temp.txt”, from KillDevil and store it in your local computer’s “/tmp” directory with the same file name. You will be prompted for your password.
scp email@example.com:/netscr/onyen/temp.txt /tmp/
You can also copy a whole directory. The following command will recursively copy the whole directory “/tmp/temp_dir/” from your local computer to KillDevil, and place it in the “/netscr/” directory with the name “temp_dir”:
scp -r /tmp/temp_dir/ firstname.lastname@example.org:/netscr/onyen
Determine the tar archive size:
du -hs mydirectory/
Pass tar through split to create multiple tar files of a max 100 GB size, ideally 5 Gb to 50 Gb range depending on total data size:
tar -cvf - | split --bytes=100g - myarchive.tar.
cat myarchive.tar.* | tar tf –
If you need to restore files back to the original state once more, concatenate files and pass through tar to expand the files:
cat myarchive.tar.* | tar xvf -
If a large tar file already exist, it may be split via the split command alone:
split --bytes=10g myarchive.tar myarchive.tar.
Please read KillDevil FAQ regarding questions useful for users new to this service.
Be sure to check the Research Computing home page for information about other resources available to you.
We encourage you to attend a training session on Getting Started on KillDevil and other related topics. Please refer to the Research Computing Training site for further information.
If you have any questions, please feel free either to call 962-HELP, email email@example.com, or submit an Online Web Ticket.