Longleaf Frequently Asked Questions (FAQs)

  1. How do I get an account on Longleaf (or Dogwood)?
  2. Why are you using SLURM rather than LSF (like we are accustomed to)?
  3. What are the details of the new filesystem location(s)?
  4. Why aren’t `/netscr` nor `/lustre` present on Longleaf?
  5. What is the queue structure on Longleaf?
  6. How do I transfer data between Research Computing clusters?
  7. How do I transfer data onto a Research Computing cluster?

 

1. HOW TO I GET AN ACCOUNT ON LONGLEAF (OR DOGWOOD)?

You can visit the Onyen Services page, click on the Subscribe to Services button and select Longleaf.  You should then click a link to https://improv.itsapps.unc.edu/#ServiceSubscriptionPlace:.

For more information on Longleaf and Pine, see: https://help.unc.edu/help/longleaf-and-pine/.

  

2. WHY ARE YOU USING SLURM RATHER THAN LSF LIKE WE ARE USED TO?

There are three reasons. 

  1. SLURM, and the configuration choices we have made are optimized for the high-throughput job streams typical of the data-intensive workloads for which Longleaf is designed.
  2. In recent cluster RFP responses from vendors, both to us and to other institutions, SLURM has been the scheduling software that is most frequently recommended.
  3. SLURM is widely used throughout the high-performance and high-throughput computing community: we wanted to be closer to what is more common.

Our early feedback on SLURM was that most job submission scripts took somewhere between a handful of minutes to a half-day to convert from LSF to SLURM.  We also had some feedback that suggested converting to the submission method for SLURM improved the cogency of the job-stream’s workflow.

A crosswalk/cheatsheet of scheduler commands is available at scheduler_commands_cheatsheet that shows some common LSF commands and their corresponding SLURM commands.  Also, see some simple examples at https://help.unc.edu/help/getting-started-example-slurm-on-longleaf/ for a walkthrough of a SLURM job submission.

  

3. WHAT ARE THE DETAILS OF THE NEW FILESYSTEM LOCATION(S)?

The following filesystems that are currently present on KillDevil are also available on Longleaf (recall that Killdevil is being decommissioned)

  • /proj/…
  • /ms/home/…
  • /ms/depts/…

with the further directory paths with which you are familiar.  The new filesystems are

  • /pine/scr/…
  • /pine/appscr/…
  • /nas/longleaf/home/…

with the `/pine` filesystem as the one of particular note since it is the one that is purpose-built for you.  It is a scratch space—with a 21-day policy, as we have on our other scratch spaces. 

 Each user’s quota in their space on “/pine” is 25-TB.  The directory structure conforms to that which is on `/lustre` on legacy Killdevil.  So, for an Onyen of “beethers”, the path would be

  • /pine/scr/b/e/beethers

For an Onyen of “shosters”, the path would be

  • /pine/scr/s/h/shosters

 Shared working directories for Gaussian, SAS and Stata live in `/pine/scr/appscr`, each with 5-TB quotas.

  

4. WHY AREN’T NET-SCRATCH NOR LUSTRE PRESENT ON LONGLEAF?

The `/lustre` filesystem is available only via the Infiniband fabric on Killdevil.  Since Longleaf nodes in no way access that fabric, `/lustre` is not present on Longleaf.

With respect to net-scratch, `/netscr`, it is not present on Longleaf for performance reasons.  First, computing with the Longleaf nodes against `/netscr` would add an extremely significant workload that `/netscr` cannot sustain—it would thus severely degrade performance for everyone.  Secondly, the `/pine` filesystem is purpose-built for I/O and balanced/designed for the Longleaf cluster: though it may take some effort to move files/data to a filesystem present on Longleaf, your results will be vastly better than doing something else.  Third, the quotas on the `/pine` filesystem are higher, so you have more resource to work with.

 

5. WHAT IS THE QUEUE STRUCTURE ON LONGLEAF?

There are four queues:

  • general : to schedule jobs to the General Purpose nodes
  • bigdata : to schedule jobs to the Big Data nodes
  • bigmem : to schedule jobs to the Big Memory nodes
  • gpu : to schedule jobs to the GPU nodes

The default access for a user/group is to the “general” queue.  If you have jobs that require any of the other queues, please contact us via research@unc.edu or via help ticket at https://help.unc.edu.

  

6. HOW DO I TRANSFER DATA BETWEEN RESEARCH COMPUTING CLUSTERS?

To transfer data within the clusters, often it is easiest to simply copy the files between mounted file systems.  On Killdevil the data mover nodes: `rc-dm1.its.unc.edu`, `rc-dm2.its.unc.edu`, `rc-dm3.its.unc.edu`, and `rc-dm4.its.unc.edu` mount the Isilon Project space (`/proj`), the Killdevil network attached scratch space (`/netscr`), and the home directories and departmental space.

Once logged in to `rc-dm.its.unc.edu` execute:

cp /proj/MyLabSpace/Test.txt /netscr/onyen/

will be sufficient to copy files from the Isilon project space to the `/netscr` scratch space.

For small files, of course, it is reasonable to do this on the cluster login nodes.  However, for large data transfers you will notice a significant performance improvement using the data mover nodes.

We are preparing datamover nodes with direct access to all relevant filesystems to facilitate inter-cluster transfers.  We will update this FAQ and provide an additional help document focused exactly on that.

If you must, you may transfer a large amount of data from Killdevil’s Lustre scratch file system.  This is possible but generally inadvisable.  If you have a large amount of data stored in the scratch file system that needs to be saved permanently, contact ITS Research Computing about setting up Isilon project space.

If you absolutely need to transfer large quantities of data from Killdevil’s Lustre to Longleaf, you will have to use Longleaf’s log in nodes and transfer through the Killdevil’s log in nodes.  Since this is a shared resource, please be respectful of other users and schedule this for a time when the log in nodes are not busy.  To do this, log on to Longleaf, then use either `rsync` or `scp`:

rsync /lustre/scr/o/n/onyen/file.txt onyen@longleaf.unc.edu:/pine/scr/o/n/onyen

or

scp /lustre/scr/o/n/onyen/file.txt onyen@longleaf.unc.edu:/pine/scr/o/n/onyen

This will be a transfer using the network infrastructure.

 

7. HOW DO I TRANSFER DATA ONTO A RESEARCH COMPUTING CLUSTER?

For transfers from your desktop or home computer, or another computer external to Research Computing, to one of the Research Computing, there are several methods.

Globus Online: https://help.unc.edu/help/globus-connect-file-transfer/.  To get started with Globus Online see the Getting Started page, see https://help.unc.edu/help/getting-started-with-globus-connect/.

In addition to Globus Online there are a number of SFTP (Secure File Transfer Protocol) tools available on both Mac and Windows platforms.

For the SFTP tools, although it is possible to connect directly to the cluster log in nodes, the cluster log in nodes are a shared resource and it is preferred to use specialized data mover nodes.

There are four data mover nodes: `rc-dm1.its.unc.edu`, `rc-dm2.its.unc.edu`, `rc-dm3.its.unc.edu`, and `rc-dm4.its.unc.edu`.  But using the host address: `rc-dm.its.unc.edu` will connect you to the least busy of the four.  This will generally result in the best performance.