- The Dogwood cluster is a Linux-based computing system available to researchers across the campus. This cluster is intended to run large-way, distributed memory, multi-node, parallel jobs. The system has a fast switching fabric (Infiniband EDR interconnect) for this purpose. Here, by large way jobs we mean jobs that span beyond a node (here > 44 cores).
You can visit the Onyen Services page, then click on the Subscribe to Services button and select Dogwood Cluster.
Linux users can use ssh from within their Terminal application to connect to Dogwood.
If you wish to enable x11 forwarding use the “–X” ssh option. Be sure to use your UNC ONYEN and password for the login:
ssh -X <onyen>@dogwood.unc.edu
Windows users should download MobaXterm (Home Edition). Then use the Session icon to create a Dogwood SSH session using dogwood.unc.edu for “Remote host” and your ONYEN for the “username” (Port should be left at 22).
Mac users can use ssh from within their Terminal application to connect to Dogwood. Be sure to use your UNC ONYEN and password for the login:
ssh -X <onyen>@dogwood.unc.edu
To enable x11 forwarding Mac users will need to download, install, and run Xquartz on their local machine in addition to using the “–X” ssh option. Furthermore, in many instances for x11 forwarding to work properly Mac users need to use the Terminal application that comes with Xquartz instead of the default Mac terminal application.
A successful login takes you to “login node” resources that have been set aside for user access. The login node is where you will edit your code, execute basic UNIX commands, and submit your jobs from to the SLURM job scheduler.
DO NOT RUN YOUR CODE OR RESEARCH APPLICATIONS DIRECTLY ON THE LOGIN NODE. THESE MUST BE SUBMITTED TO SLURM!
NAS home space
Your home directory will be in /nas/longleaf/home/. Your home
directory has a 75 GB soft limit and a 50 GB hard limit. Note that the
Dogwood and Longleag clusters share the same home file spaces. Thus
if you are someone who uses both clusters, we strongly recommend
creating a longleaf and/or dogwood subdirectory under your home
directory to keep the files separated as needed.
Your scratch directory will be in
(the “o/n/” are the first two letters of your onyen).This is the scatch space for working with large files.The following apply to the scratch space:
- Scratch space uses standard UNIX permissions to control access to files and directories. By default other users in your group (graduate sthttps://help.unc.edu/217udents, faculty, employees) have read access to your scratch directory. You can easily remove this read permission with the “chmod” command.
- A policy has been established for cleaning out files. Scratch file deletion will be enforced with files older than 21 days being removed. Any file not used or modified in the last 21 days will be deleted.
- Scratch space is a shared, temporary work space. Please not that scratch space is not backed up and is, therefore, not intended for permanent data storage. See the “Mass Storage” section below about how to store permanent data.
- Note it is a violation of research computing policy to use artificial means, such as the “touch” command, to maintain unused files in the scratch directory beyond their natural lifetime. Violators will be warned and repeat violators are subject to loss of privileges and access. This is a shared resource, please be courteous to other users.
- Try to avoid using “ls -l” and use “ls” with no options instead. ?? drop this ??
- Never have a large number of files (>1000) in a single directory.
- Avoid submitting jobs in a way that will access the same file(s) at the same point(s) in time.
- Limit the number of processes performing parallel I/O work or other highly intensive I/O jobs.
The Mass Storage system (also known as StorNext and mounted as /ms) is intended for archiving files and storing very large files. Files located in mass storage are not accessible to jobs running in Slurm. Mass storage is not to be used as a work directory or as a backup location for local disk drives, operating systems, or software. In general, files that change often or directories with more than a thousand files in them will cause performance problems and consume tape resources. The PC backup software provided by UNC might be an alternative solution rather than having to copy your PC files to mass storage.
Mass Storage is similar to an ordinary disk file system in that it keeps an inode (for recording data location, etc.) and data blocks for each file. Files can be moved in and out of mass storage by using simple UNIX commands such as “cp” and “mv” or by using sftp/scp. As the Mass Storage system is optimized for archiving data, your programs should not directly read or write from the Mass Storage system. Instead copy your data from “~/ms” to scratch space such as “/netscr/”.
If you are routinely storing large numbers of small files (more than several hundred files at a time) in Mass Storage, you should “tar” or “zip” those smaller files into one tarball or zip file outside of mass storage and then move that tarball or zip file to mass storage. You are not required to compress the tarball or zip file since the mass storage tape drive hardware will compress your data. Reducing the number of individual small files will help the overall performance of the StorNext Mass Storage system. See the more detailed list of things to avoid.
To access Mass Storage from Dogwood, type:
Any files in the scratch space that you wish to save, can be moved to the mass storage preferably in tar or zip format.
The environment on Dogwood is managed as modules. The basic module commands are
module [ add | avail | help | list | load | unload | show ]
When you first log in you should run
And the response should be
To add a module for this session only, use “module add [application]” where “[application]” is the name given on the output of the “module avail” command.
To add a module for every time you login, use “module save”. This does not change your current session, only later logins.
Please refer to the Help document on modules for further information.
Once you have decided what software you need to use, added those packages to your environment using modules, and you have successfully compiled your serial or parallel code, you can then submit your jobs to run on Dogwood. We use the Slurm workload manager software to schedule and manage jobs that are submitted to run on Dogwood.
To submit a job to run, you will need to use the SLURM “sbatch” command as shown below. SLURM submits jobs to particular job partitions you specify.
A short description of the partitions available to users in the Dogwood
cluster can be found here.
Monitoring and Controlling Jobs:
You can check the status of your submitted SLURM jobs with the command “
squeue -u ” (note
squeue shows jobs from all users, provide your onyen to just show your jobs) The output of that command will include a Job ID, the state of your job (e.g. pending or running), the partition to which you submitted the job, the job name, and other information. See “man squeue” for more information on using this command.
If you need to kill/end a running job, use the “
JobID is the SLURM job ID displayed with the “
Finally, you can provide an output file to SLURM (“-o filename” in the sbatch command). For regular jobs, if you don’t provide a name the default file name is “slurm-%j.out”, where the “%j” is replaced by the SLURM job ID.
Jobs running outside the SLURM partitions will be killed. The logon privileges of users who repeatedly run jobs outside of the SLURM partitions will be suspended.
Be sure to check the Research Computing home page for information about other resources available to you.
We encourage you to attend a training session on “Using Dogwood” and other related topics. Please refer to the Research Computing Training site for further information.
If you have any questions, please feel free either to call 962-HELP, email firstname.lastname@example.org, or submit an Online Web Ticket.