- How do I get an account on Longleaf (or Killdevil)?
- Why are you using SLURM rather than LSF (like we are accustomed to)?
- What are the details of the new filesystem location(s)?
- Why aren’t `/netscr` nor `/lustre` present on Longleaf?
- What is the queue structure on Longleaf?
- How do I transfer data between Research Computing clusters?
- How do I transfer data onto a Research Computing cluster?
- How do I get help?
- What if the decommission date (April 24th) is enormously disruptive for me?
We will update this list continuously.
You can visit the Onyen Services page http://onyen.unc.edu, click on the Subscribe to Services button and select Longleaf. You should then click a link to http://itsapps.unc.edu/improv/#ServiceSubscriptionPlace.
For more information on Longleaf and Pine, see: https://help.unc.edu/help/longleaf-and-pine/.
There are three reasons.
- SLURM, and the configuration choices we have made, are optimized for the high-throughput job streams typical of the data-intensive workloads for which Longleaf is designed.
- In recent cluster RFP responses from vendors, both to us and to other institutions, SLURM has been the scheduling software that is most frequently recommended.
- SLURM is widely used throughout the high-performance and high-throughput computing community: we wanted to be closer to what is more common.
Our early feedback on SLURM was that most job submission scripts took somewhere between a handful of minutes to a half-day to convert from LSF to SLURM. We also had some feedback that suggested converting to the submission method for SLURM improved the cogency of the job-stream’s workflow.
A crosswalk/cheatsheet of scheduler commands is available at scheduler_commands_cheatsheet that shows some common LSF commands and their corresponding SLURM commands. Also see some simple examples at https://help.unc.edu/help/getting-started-example-slurm-on-longleaf/ for a walkthrough of a SLURM job submission.
The following filesystems that are currently present on KillDevil are also available on Longleaf
with the further directory paths with which you are familiar. The new filesystems are
with the `/pine` filesystem as the one of particular note since it is the one that is purpose-built for you. It is a scratch space—with a 21-day policy, as we have on our other scratch spaces.
Each user’s quota in their space on “/pine” is 25-TB. The directory structure conforms to that which is on `/lustre` on Killdevil. So, for an Onyen of “beethers”, the path would be
For an Onyen of “shosters”, the path would be
Shared working directories for Gaussian, SAS and Stata live in `/pine/scr/appscr`, each with 5-TB quotas.
The `/lustre` filesystem is available only via the Infiniband fabric on Killdevil. Since Longleaf nodes in no way access that fabric, `/lustre` is not present on Longleaf.
With respect to net-scratch, `/netscr`, it is not present on Longleaf for performance reasons. First, computing with the Longleaf nodes against `/netscr` would add an extremely significant workload that `/netscr` cannot sustain—it would thus severely degrade performance for everyone. Secondly, the `/pine` filesystem is purpose-built for I/O and balanced/designed for the Longleaf cluster: though it may take some effort to move files/data to a filesystem present on Longleaf, your results will be vastly better than doing something else. Third, the quotas on the `/pine` filesystem are higher, so you have more resource to work with.
There are four queues:
- general : to schedule jobs to the General Purpose nodes
- bigdata : to schedule jobs to the Big Data nodes
- bigmem : to schedule jobs to the Big Memory nodes
- gpu : to schedule jobs to the GPU nodes
To transfer data within the clusters, often it is easiest to simply copy the files between mounted file systems. On Killdevil the data mover nodes: `rc-dm1.its.unc.edu`, `rc-dm2.its.unc.edu`, `rc-dm3.its.unc.edu`, and `rc-dm4.its.unc.edu` mount the Isilon Project space (`/proj`), the Killdevil network attached scratch space (`/netscr`), and the home directories and departmental space.
Once logged in to `rc-dm.its.unc.edu` execute:
cp /proj/MyLabSpace/Test.txt /netscr/onyen/
will be sufficient to copy files from the Isilon project space to the `/netscr` scratch space.
For small files, of course, it is reasonable to do this on the cluster login nodes. However, for large data transfers you will notice a significant performance improvement using the data mover nodes.
We are preparing datamover nodes with direct access to all relevant filesystems to facilitate inter-cluster transfers. We will update this FAQ and provide an additional help document focused exactly on that.
If you must, you may transfer a large amount of data from Killdevil’s Lustre scratch file system. This is possible but generally inadvisable. If you have a large amount of data stored in the scratch file system that needs to be saved permanently, contact ITS Research Computing about setting up Isilon project space.
If you absolutely need to transfer large quantities of data from Killdevil’s Lustre to Longleaf, you will have to use Longleaf’s log in nodes and transfer through the Killdevil’s log in nodes. Since this is a shared resource, please be respectful of other users and schedule this for a time when the log in nodes are not busy. To do this, log on to Longleaf, then use either `rsync` or `scp`:
rsync /lustre/scr/o/n/onyen/file.txt email@example.com:/pine/scr/o/n/onyen
scp /lustre/scr/o/n/onyen/file.txt firstname.lastname@example.org:/pine/scr/o/n/onyen
This will be a transfer using the network infrastructure.
For transfers from your desktop or home computer, or another computer external to Research Computing, to one of the Research Computing, there are several methods.
Globus Online: http://help.unc.edu/help/globus-connect-file-transfer/. To get started with Globus Online see the Getting Started page, see http://help.unc.edu/help/getting-started-with-globus-connect/.
In addition to Globus Online there are a number of SFTP (Secure File Transfer Protocol) tools available on both Mac and Windows platforms.
- SSH/SFTP Secure Shell is available for Windows platforms at UNC Software Acquisition Shareware: http://software.sites.unc.edu/shareware/#s.
- CyberDuck (https://cyberduck.io/) is available for both Mac and Windows platforms.
- CoreFTP (http://coreftp.com/) is another possibility for Windows platforms.
- FileZilla (https://filezilla-project.org/) is also an option for Mac, Windows, and Linux platforms.
For the SFTP tools, although it is possible to connect directly to the cluster log in nodes, the cluster log in nodes are a shared resource and it is preferred to use specialized data mover nodes.
There are four data mover nodes: `rc-dm1.its.unc.edu`, `rc-dm2.its.unc.edu`, `rc-dm3.its.unc.edu`, and `rc-dm4.its.unc.edu`. But using the host address: `rc-dm.its.unc.edu` will connect you to the least busy of the four. This will generally result in the best performance.
We are conducting Getting Started on Longleaf Workshops every week through April:
- March 8, Wednesday 10:00am : Room 328, Health Sciences Library
- March 14, Tuesday 11:00am : Room 328, Health Sciences Library
- March 24, Friday 2:00pm : Room 328, Health Sciences Library
- March 28, Tuesday 10:00am : Room 328, Health Sciences Library
- April 3, Monday 3:00pm : Room 328, Health Sciences Library
- April 13, Thursday 4:00pm : Room 328, Health Sciences Library
- April 17, Monday 11:00am : Room 328, Health Sciences Library
- April 27, Thursday 2:00pm : Room 328, Health Sciences Library
The workshops are not lecture format. One might call them “flipped”—give Longleaf a try, and if you have questions or challenges, bring the question or challenge to the workshop and we will work with you then and there to get through it.
You are always welcome to visit with us during the Research Computing Office Hours
9. WHAT IF THE APRIL 24TH DECOMMISSION DATE IS ENORMOUSLY DISRUPTIVE FOR ME?
Please let us know. Our aim is to ensure that critical and time-sensitive projects/tasks have the opportunity to complete—but that new projects/tasks not be initiated on Kure. So we will endeavor to assist you in making this change minimally disruptive.
If Kure’s decommission date of April 24th induces insuperable risk or stress for you, please email Michael Barker, Ph.D., Assistant Vice Chancellor for Research Computing and Learning Technologies at email@example.com. You may also contact us via firstname.lastname@example.org or via a help ticket at http://help.unc.edu.