Mass Storage

Overview

FAQs

Appropriate Use

Things to Avoid

Creating and Accessing Space

How Do I Put Files There?

Availability of Accidentally Deleted Files

Long Term Storage of Files

How does mass storage work?

Additional help

Overview

The Mass Storage system (also known as StorNext or /ms) is intended to be used for archiving research data files and storing very large files, files that are too large to fit within an individual researcher’s or a department’s storage space. Storage resources are limited; if you will need to store more than 10 Terabytes of data, you should contact Research Computing group (research@unc.edu) to arrange to purchase tapes for your data.  Inquiries about what the current price of tapes is or any other questions regarding this process should also be emailed to Research Computing.  Mass storage is currently not an appropriate archival or general storage location for sensitive data.

When storing files in mass storage, you should store a tar or zip archive instead of directories with many files. Please read the “Appropriate Use” and “Things to Avoid” sections below for what can and can’t be stored in the mass storage system.

Specifically, do not use mass storage to backup your desktop hard disk. Mass storage is not to be used as a backup location for local disk drives, operating systems, or software. Nor is it to be used for copies of sensitive data files. In general, files that change often or directories with more than a few thousand files in them will cause performance problems and consume tape resources. The PC backup software provided by UNC might be an alternative solution for you.

FAQs

Please read Mass Storage Common FAQs regarding the following questions useful for users new to this service.

  • What is Mass Storage?
  • How do I get access to Mass Storage service?
  • Is there a limit for Mass Storage?
  • Why can’t I open my files put on Mass Storage?

Appropriate Use

The Research Computing Mass Storage system (also known as StorNext or /ms) is intended for archiving files and storing very large files, tar or zip archives of many smaller files, and files that are too large to fit in the AFS quota space. It is not intended to be used as a backup location for disk drives, operating systems, or software. Mass storage should NOT be considered as a solution for archiving or storing sensitive data. In general, files that are changed often or directories with too many files in them will cause performance problems and consume too many tapes. The PC backup software provided by UNC might be an alternative solution for you rather than copying your PC files to mass storage.

We vigorously enforce the appropriate use of the mass storage system. This means that should we find inappropriate use we may deny access to the directory in question, delete the files or directories in question, and, if necessary, disable the userid (Onyen) in question. We will always try to contact the owner of the data before taking any actions. Again, please remember that mass storage is not to be used for storing or archiving sensitive data.

If you are routinely storing large numbers of small files (more than several thousand files at a time) in mass storage, you should tar archive or zip archive those smaller files into one tarball or zip file outside of mass storage and then move that tar archive or zip archive file to mass storage. You are not required to compress the tar or zip file since the mass storage tape drive hardware will compress your data. Reducing the number of individual small files will help the overall performance of the StorNext Mass Storage system. More detailed list of “Things to avoid”.

Things to Avoid

There are things to avoid in order to make good use of the Mass Storage system.

1. Do not store or archive sensitive data to the Mass Storage system.

2. Do not use Mass Storage to backup your desktop hard disk.

Mass storage is not intended to be used as a backup location for local disk drives, operating systems, or software.  In general, files that change often or directories with more than a thousand files will cause performance problems and consume tape resources. The  PC backup software provided by UNC might be an alternative solution to copying your PC files to mass storage.

3. Do not create >1000 files in a single directory. Mass Storage scans its directory space several times every hour. The time to scan a directory is exponentially linked to the number of files and directories within it. Since scanning is single threaded, one large directory can slow down the entire system. Similarly, creating one large output file is preferable to several smaller ones.

4. Do store a tar or zip archive in Mass Storage space, but please create the archive outside of Mass Storage.  It is very advantageous to our administering of Mass Storage for people to put tar or zip archives in Mass Storage instead of directories with many small files.

5. Do not run the tar command on directories or files in Mass Storage.  Again, create the tar archive outside Mass Storage in your home directory or scratch space.

6. Do not compress files which already exist in Mass Storage space.  If a file is not in Mass Storage, you can compress it first (although this is not necessary) outside of Mass Storage space and then move the compressed file into Mass Storage.  Note that you gain nothing by putting compressed files in Mass Storage space, because all files in Mass Storage are compressed when written to tape.

7. Do not modify frequently any file for consecutive days. Mass Storage only copies to tape files which have not been modified for a few hours.

8. As a corollary to the last item, do not put in Mass Storage files that will be frequently modified.

9. Do not write directly into Mass Storage space, such as your ms directories.  Instead, export any files from your application to your home space and then perform a mv to your ms directories.

10. Do not execute long-running programs when your current working directory is a Mass Storage directory.

11. Do not execute a program if the executable file is in Mass Storage space.

12. Do not execute long-running programs that open files in Mass Storage space.

13. Do not use Mass Storage as a scratch space. If you are going to create a number of files which you do not want to keep permanently, use scratch space.  Some programs, such as Gaussian, create huge temporary files which do not always get deleted. Such files are a large waste of tape resources.

14. Do not write to Mass Storage directly when creating a dataset in SAS. Instead, write your dataset to scratch space, then copy it to mass storage once you have finished modifying it.

15. It if helpful not to copy many symbolic links and empty files (i.e. batch jobs *.err files) into mass storage as these do not get archived to tape.

Creating and Accessing Space

Only faculty, staff, and graduate students may subscribe to the Mass Storage tape archival service. Be sure to read and understand the intended use of the Mass Storage system, the Appropriate Use Policy and Things to Avoid before subscribing to this service.  You can subscribe to Mass Storage services from the Onyen Services web page.  Select Subscribe to Services and then click on one of the Research Computing services (ie Longleaf or KillDevil).  After a few hours, a link to a mass storage directory, ms/, will be created in your home directory.  You can then access that directory from many of the Research Computing servers.

Note that storage resources are finite; we cannot store an unlimited amount of data. If you will need to store more than 1 Terabytes of data, you can contact Research Computing group (research@unc.edu) to arrange to purchase tapes. In this case, please contact Research Computing staff at least two months in advance to discuss tape purchases as well as to discuss performance issues and anticipated usage patterns.

In some circumstances, it may be useful for departments to have mass storage space where collaborators and project teams can store data shared by all members of the group. To request departmental mass storage space, open a Help Request ticket. Include the following in the Help Request ticket:

  • Directory name
  • Primary and second name contacts
  • Linux group name
  • Onyens accounts to be added to the Linux group
  • Amount of data you need to store

If you are currently doing any large moves or copies of data (as to or from mass storage) while on the Reseach Clusters we hope you will use the LSF command:

bsub -q ms cp /netscr/myonyen/output/* /ms/home/m/y/myonyen/saved_output

This bsub command, issued on KillDevil with the “-q ms” parameters, will submit your copy or move job to a host with very good connectivity to the mass storage system. We expect these hosts to handle multiple data moves well, removing this burden from the KillDevil login nodes.

How Do I Put Files There?

Files can be moved in and out of mass storage by using simple Linux commands such as “cp” and “mv”. As the Mass Storage system is optimized for archiving data, your programs should not directly read or write from the Mass Storage system. Instead, copy your data from ~/ms to scratch space such as /netscr/<onyen> or /pine/scr/o/n/<onyen>.

Mass storage should not be used to store or archive sensitive data.

Outside of UNC’s campus, the easiest way to store files in Mass Storage is to use “sftp” a secure implementation. Simply “sftp” to a Research Computing  host and login with your Onyen and Onyen password. If you are using a command-line implementation of “sftp”, you will then change directories to your ms subdirectory:

cd  ms

Then use the “put”command to copy the files you want to store from your computer to that directory in mass storage.

If you are using a web file transfer tool such as Globus Connect or GUI “sftp” program such as SSH Secure Shell, connect to Research Computing host then use your Onyen and Onyen password to login.  In the Remote Site window, navigate to your “ms” subdirectory. You can then drag files from your local system to the remote site window.

Remember that mass storage is intended to serve as an archive for important, non-sensitive data files and work that you need to keep long-term. It may not be used to backup your desktop systems. The PC backup software provided by UNC is an alternative solution for that purpose.

Availability of Accidentally Deleted Files

On an ordinary Linux file system, a file’s inodes and data are removed upon deletion. In Mass Storage, the inodes are removed; however, the data remains on tape until a “recycle” process is run. This means that any file which has been accidentally deleted, but which has had its inodes backed up and data written to tape, is potentially retrievable. Alternately, should you create a file then delete it prior to the backups, the file is lost. If the machine hosting Mass Storage crashes before the up-to-date data is copied to tape, you may lose that day’s work.

To obtain assistance with recovery of a deleted Mass Storage file open a Help Request ticket.

Recycling is done as needed. The need is determined by highwater marks in the Mass Storage software by available tape space and the number of slots available in the IBM tape library available to hold tapes. We can do a recycle automatically without notification.

Long Term Storage of Files

We have chosen to implement StorNext as our mass storage software. We believe that any non-deleted file that is on Mass Storage tape will be retrievable for as long as the tape remains readable. Due to the unknown longevity of tape media, we can not guarantee how long tapes will remain readable. However, we have taken steps to insure, as much as possible, that files that have not been deleted will be available for as long as we own the tapes. The tapes we currently own have a shelf life of 10 years but we attempt to migrate onto new tape technology for better compression of data per tape. We keep two copies of every file, we keep the tapes in a climate-controlled room, and we periodically move the second copies to a separate storage facility. It is also possible that files will be corrupted when written to tape (both copies) and such files may not be recoverable. Because data are stored in a non-proprietary format, and encryption is not used when writing tapes, you should NOT use mass storage to archive or store any sensitive data.

How does mass storage work?

The mass storage system uses the StorNext software product to manage storage resources. StorNext is similar to an ordinary disk file system in that it keeps an inode (for recording data location, etc.) and data blocks for each file. For the user of mass storage, this file system appears to be a subdirectory of the user’s home directory. Files can be moved in and out of mass storage by using simple Linux commands such as “cp” and “mv” or by using “sftp” or “scp”. Note that your mass storage directory “ms” cannot be accessed directly by using a Windows share mount or AFS client; instead, you must be logged in to a Research Computing server which has mass storage mounted.

StorNext is different from ordinary disk systems in that it keeps data blocks on tapes, while the inode information remains on disk. When a file is created, an inode is immediately created and the data goes to the StorNext disk cache. If the file stays unmodified for a few hours, it will be copied from disk cache to tape. The tape drive hardware compresses the data as it is written to tape. StorNext copies the data to two different tapes to ensure that we have a backup copy of every file. One tape is always on-site and one tape is stored off-site in a secure location. If problems are encountered when reading data from the on-site tape, we can still retrieve your data but it may take several work days to recall the second tape copy from off-site storage. Please remember that mass storage, should NOT be used to store sensitive data.

When the StorNext disk cache is 90% full, StorNext automatically does a release: it releases the data blocks of files that have already been written to tape until the disk cache is only 70% full. When a file that has been released is accessed it will take at least one minute for the data to be staged or brought back from tape to the disk cache.

Every 24 hours, the inodes are backed up to a location other than StorNext space. Therefore, if we experience a system problem related to StorNext, we can restore the inode table from backup. This will restore every file that was written to tape (archived) and had its inode backed up. This means the file must have been unmodified for a minimum of 24 hours to a maximum of 48 hours.

We monitor the use of mass storage and will inform you if you are using it inappropriately or if you need to purchase tapes to accommodate the volume of data that you need to store.

Additional help

Research Computing home page