Getting Started on Longleaf

Table of Contents

Introduction

System Information

Getting an Account

Logging In

Main Directory Spaces

Applications Environment

Job Submission

Longleaf Partitions

More SLURM Commands

Additional Help

Introduction

The Longleaf cluster is a Linux-based computing system available to researchers across the campus. With over 10,000 computing cores a large scratch disk space, it provides an environment that is optimized for memory and I/O intensive, loosely coupled workloads with an emphasis on aggregate job throughput over individual job performance. In particular workloads consisting of a large quantity of jobs each requiring a single compute host are best suited to Longleaf.

System Information

  • Login nodes:
    • 2 login nodes: longleaf.unc.edu
  • General compute nodes:
    • 147 compute nodes, each with 24 physical cores, 2.50 GHz Intel processors, 30M cache (Model E5-2680 v3), 256-GB RAM, and 2x10Gbps NIC.
    • 30 compute nodes, each with 36 physical cores, 2.30 GHz Intel processors, 24.75M cache, 754-GB RAM, and 2x10Gbps NIC.
  • Large memory compute nodes:
    • 5 compute nodes, each with 64 physical cores, 2.50 Ghz Intel processors, 45M cache (Model E7-8867 v3), 3-TB RAM, and 2x10Gbps NIC.
  • Big data compute nodes:
    • 30 compute nodes, each with 12 physical cores, 3.40 Ghz Intel processors, 20M cache (Model E5-2643 v3), 256-GB RAM, and 2x25Gbps NIC.
  • GPU compute nodes:
    • 5 compute nodes, each with 8 GeForce GTX1080 gpus, 8 physical cores, 2.60 GHz Intel processors, 10M cache (Model E5-2623 v4), 64 GB memory, and 2x10Gbps NIC. 40 GeForce GTX1080 gpus total (Pascal architecture), each with 2560 NVIDIA CUDA cores, 8 GB GDDR5X memory.
    • 16 compute nodes, each with 4 Tesla V100-SXM2 gpus, 40 physical cores, 2.40 GHz Intel processors, 27.5 M cache, 250-GB RAM, and 2x10Gbps NIC. 64 V100-SXM2 gpus total, each with 5120 NVIDIA CUDA cores, 16 GB HBM2 memory.
  • Operating System:
    • Red Hat Enterprise Linux Server release 7.4 (Maipo)
  • Shared Filesystems:
    • 2.2 PB “/pine/scr” Pine File System for scratch storage
    • 465 TB “/nas” NetApp NFS for home directories, depts space, and apps
    • 3.7 PB “/proj” storage intended for use by PIs, labs, etc.
    • 219 TB “/ms” Mass Storage intended for long-term (permanent) storage of data, files, etc.
  • Resource management:
    • Handled by SLURM, through which all jobs are submitted for processing

Getting an Account

You can visit the Onyen Services page, then click on the Subscribe to Services button, select Longleaf Cluster, and then fill out the web form. You will receive an email notification once your account has been created.

Logging In

Linux:

Linux users can use ssh from within their Terminal application to connect to Longleaf.

If you wish to enable x11 forwarding use the “–X” ssh option. Be sure to use your UNC ONYEN and password for the login:

ssh -X <onyen>@longleaf.unc.edu

Windows:

Windows users should download MobaXterm (Home Edition). Then use the Session icon to create a Longleaf SSH session using longleaf.unc.edu for “Remote host” and your ONYEN for the “username” (Port should be left at 22).

Mac:

Mac users can use ssh from within their Terminal application to connect to Longleaf. Be sure to use your UNC ONYEN and password for the login:

ssh -X <onyen>@longleaf.unc.edu

To enable x11 forwarding Mac users will need to download, install, and run Xquartz on their local machine in addition to using the “–X” ssh option. Furthermore, in many instances for x11 forwarding to work properly Mac users need to use the Terminal application that comes with Xquartz instead of the default Mac terminal application.

A successful login takes you to “login node” resources that have been set aside for user access. The login node is where you will edit your code, execute basic UNIX commands, and submit your jobs from to the SLURM job scheduler.

DO NOT RUN YOUR CODE OR RESEARCH APPLICATIONS DIRECTLY ON THE LOGIN NODE. THESE MUST BE SUBMITTED TO SLURM!

In order to connect to Longleaf from an off-campus location, a connection to the campus network through a VPN client is required.

Main Directory Spaces

NAS home space

Your home directory will be in /nas/longleaf/home/<onyen> and is backed up via snapshots.

Your home directory has a quota which you will want to monitor occasionally: 50 GB soft limit and a 75 GB hard limit.

Work/Scratch Space

  • Pine scratch space:

Your /pine directory will be in /pine/scr/<o>/<n>/<onyen> (the “o/n/” are the first two letters of your ONYEN) with a quota of 30 TB.

Pine, our purpose-built filesystem for high-throughput and data-intensive computing as well as information-processing tasks, includes:

  • Connected to Longleaf compute nodes by zero-hop 40Gbps connections
  • 14-controllers (for throughput and fault tolerance)
  • High-performance parallel filesystem (GPFS)
  • Tiered: approx 210-TB SSD disk; and approx. 2-PB SAS disk

The Pine-based scratch space, /pine, should be used as your primary scratch working directory.

Be aware of the following in regard to using the /pine file system:

  • A policy has been established for cleaning out files. Scratch file deletion will be enforced with files older than 36 days being removed. Any file not used or modified in the last 36 days will be deleted.
  • Scratch space is a shared, temporary work space. Please note that scratch space is not backed up and is, therefore, not intended for permanent data storage. Use Mass Storage to store permanent data.
  • Note it is a violation of research computing policy to use artificial means, such as the “touch” command, to maintain unused files in the scratch directory beyond their natural lifetime. This is a shared resource, please be courteous to other users.

Proj Space

“/proj” space is available to PIs (only) upon request. The amount of /proj space initially given to a PI varies according to their needs. Unlike /pine scratch space there is no file deletion policy for /proj space, but users should take care in managing the use of their /proj space to stay under assigned quotas. Note that by default /proj space is not backed up.

For further information and to make a request for /proj space please email research@unc.edu.

Mass Storage Space

By default all users have access to personal Mass Storage space and can easily use 1 TB of space here. Your mass storage directory will be /ms/home/<o>/<n>/<onyen>.

Mass storage is intended for long-term storage and archiving of files; it is a very slow file system and you should only copy files into and out of mass storage.

Jobs running on the cluster can not access the mass storage filesystem. If you keep data, files, etc. in your mass storage space you will first need to copy them to an appropriate directory (i.e. your home directory, /pine space, or /proj space) before you can use them when running jobs.

If you are part of a department, group, or lab that needs shared mass storage space you can send an email to research@unc.edu to make that request.

Applications Environment

The application environment on Longleaf is presented as modules using lmod. Please refer to the Help document on modules for information and examples of using module commands on Longleaf.

Modules are essentially software installations for general use on the cluster. Therefore, you will primarily use module commands to add and remove applications from your Longleaf environment as needed for running jobs. It’s recommended that you keep your module environment as sparse as possible.

Applications used by many groups across campus have already been installed and made available on Longleaf. To see the full list of applications currently available run

module avail

Users are able to create their own modules.

Job Submission

Job submission is handled using SLURM. In general there are two ways to submit a job. You can either construct a job submission script or you can use a command-line approach.

Method 1: The Submission Script
Create a job submission script using your favorite UNIX editor: emacs, vi, nano, etc.
If you don’t have a favorite editor, you can use nano (for now).

nano example.sh

The following script contains job submission options (the #SBATCH lines) followed by the actual application command.

In this example, you would enter the following into your script (Note: that each SBATCH switch below has two ‘-‘ characters, not one):

#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --time=1:00
#SBATCH --mem=100

hostname

In this example, the command you are running is the “hostname” command and the the job submission options request 100MB of memory,  1 core, and a one minute time limit.

To learn more about the many different job submission options feel free to read the man pages on the sbatch command:

man sbatch


Save your file and exit nano. Submit your job using the sbatch command:

sbatch example.sh

The equivalent command-line method would be

sbatch --ntasks=1 --time=1:00 --mem=100 --wrap="hostname"

For application specific examples of each method see this help doc.

For your job submissions, you will need to specify the SBATCH options appropriate for your particular job. The more important SBATCH options are the time limit (––time), the memory limit (––mem), and the number of cpus (––ntasks), and the paritition (–p). There are default SBATCH options in place:

  • The default partition is general.
  • The default time limit is one hour.
  • The default memory limit is 4 GB.
  • The default number of cpus is one.

Longleaf Partitions

Longleaf is set up into different SLURM partitions which you can see by running the command

sinfo

If you’ve read the section on System Information then you are familiar with the different node types available on Longleaf. The different nodes are available through different SLURM partitions:

  • the general compute nodes are suitable for most user jobs and are available in the general partition.
  • the large memory compute nodes are for jobs that need a very large amount of memory are accessible through the bigmem partition.
  • the GeForce GTX1080 gpus are accessible via the gpu partition; the Tesla V100-SXM2 gpus are accessible via the volta-gpu partition.
  • the interact partition is for jobs that require an interactive session for running a GUI, debugging code, etc.

More SLURM Commands

To see the status of all current jobs on Longleaf:

squeue

To see the status of only your submitted jobs on Longleaf:

squeue -u <onyen>

where you’ll need to repalce <onyen> with your actual ONYEN.

To cancel a submitted job:

scancel <jobid>

where you’ll need to replace <jobid> with the job’s ID number which you can get from the squeue command.

To check the details of a completed job:

sacct -j <jobid> --format=User,JobID,MaxRSS,Start,End,Elapsed

where you’ll need to replace <jobid> with the job’s ID number. The items listed in ––format are specified by the user. In this example, we’ll get the user name, job ID, maximum memory used, start time, end time, and elapsed time associated with the job.

For more information on the sacct command

man sacct

Additional Help

Please read the Longleaf FAQ for answers to specific questions.

See the Using Longleaf presentation.

If you have any questions, please feel free either to call 962-HELP, email research@unc.edu, or submit an Online Web Ticket.