Globus Connect file transfer

Globus and GridFTP

Globus augments in secure copy (scp/sftp) requests by automating reliable large data transfers, by resuming failed transfers, encrypts transfers and by simplifying the implementation of high-performance transfers between computing centers.

The UNC Globus Connnect Collection is called “UNC, Research Computing, Datamover” (also called uncch#unc-rc-dm Collection).

Accessible filesystems include but not limited to:   /pine/scr and /21dayscratch, Home directories (both new /nas/longleaf/home and old /nas02/home), /proj,  Mass storage (~/ms, /ms/home, /ms/depts), /nas/depts, and other /nas*.

Globus Connect and Globus.org

Globus Connect is a Software as a Service (SaaS) deployment of the Globus Toolkit which provides end-users with a web browser interface to initiate fast, reliable data transfers between Collections registered with the Globus Alliance.  Globus Connect allows registered users to transfer files from one Collection endpoint to another Collection endpoint.  “Collections” are terminals for data; they can be laptops or high-performance research clusters, and anything in between. The servers at Globus.org act as intermediaries-negotiating, monitoring and optimizing transfers.   Due to its ease of use, Globus Connect is recommended for individuals without extensive IT experience.

To learn more:  https://www.globus.org/

We highly recommend reading through Getting Started with Globus Connect initial setup.

 

Logging into Globus Connect

To use Globus Connect file transfer, go to the login web page: https://www.globus.org/SignIn#

You will use your UNC Onyen user-id and Onyen password.  From the SignIn page, enter and select “University of North Carolina at Chapel Hill” as the CILogon method.

For details on setup and using your UNC Onyen account we recommend following Getting Started with Globus Connect initial setup.

 

Transferring data to/from a local workstation

You can use Globus Connect to transfer data between your local workstation (e.g., your laptop or desktop) and Research Computing storage.  In this workflow, you configure your local workstation as a Globus Collection using Globus Connect.

  1. Log in to Globus.org
  2. Use the “Endpoints” interface tab to add your personal Manage Endpoint by selecting “+ sign” in the upper-right corner.  For more information about adding a Mac, Windows, or Linux personal endpoint see Globus.org support.
  3. Use the File Manager interface tab to using your workstation personal endpoint for one side of the transfer, the “UNC, Research Computing, Datamover” Collection for the other side.  You will be required to authenticate to the Research Computing Collection endpoint using your Onyen account.

By default, File transfers are encrypted over the wire.  By default, “Transfer” means make a copy of the files from one Collection to the other, existing files are overwritten.  Under the “Transfer & Sync Options” pulldown menu at the bottom of the web page:

These are some options as a capability:

  • Only transfer new or changed files where the checksum is different
  • Delete files on destination that do not exist on source
  • Preserve source file modification times
  • Verify file integrity after transfer
  • Encrypt transfer (enabled at UNC’s Collection endpoint by default even if box is not checked)
To setup your own personal Collection endpoint use Globus Connect Personal.

Also found in the “Globus Connect Personal” section of the help doc Getting Started with Globus Connect initial setup.

In this example, the left is the UNC, Research Computing, Datamover Collection.   On the right is the local desktop Collection endpoint you can setup in the above steps.

 

Transferring data between two research computing storage Collections

Access the “UNC, Research Computing, Datamover” Collection as source and destination allows you to transfer data between various Research Computing storage such as /pine and Mass Storage (~/ms/), for example.  This method would be the same as using “ms” queue or a basic mv/cp task on the command-line on the research clusters.   The benefit is fast, reliable file transfer, and the resuming failed transfers.

  1. Log in to Globus.org
  2. Use the File Manager interface tab using the “UNC, Research Computing, Datamover” Collection for one side of the transfer, and a 2nd “UNC, Research Computing, Datamover” Collection.  You will be required to authenticate to the Research Computing Collection using your Onyen account.  The other remote Collection endpoint may require its own authentication as well.  File transfers are encrypted over the wire by default.  Also by default, “Transfer” means make a copy of the files from one Collection to the other, existing files are overwritten.  Under the “Transfer & Sync Options” pull-down menu to the bottom. These are some options as a capability:
    • Only transfer new or changed files where the checksum is different
    • Delete files on destination that do not exist on source
    • Preserve source file modification times
    • Verify file integrity after transfer
    • Encrypt transfer (enabled at UNC Collection endpoint by default even if box is not checked)

Transferring data between two remote Collections

Globus.org can also be used to transfer data between two remote Globus Collection endpoints (e.g., between another research compute center’s Globus Collection endpoint and the Research Computing Collection endpoint.)

  1. Log in to Globus.org
  2. Use the File Manager interface tab using the “UNC, Research Computing, Datamover” Collection for one side of the transfer, and another Collection of your choice for the other side.  You will be required to authenticate to the Research Computing Collection endpoint using your Onyen account.  The other remote Collection endpoint may require its own authentication as well.

Globus Connect command-line interface

Globus.org provides a command-line interface (CLI) as an alternative to its web interface.

To use the CLI you must have a Globus account with ssh access enabled. To enable your account for ssh access you must add your ssh public key to your Globus account by visiting the Manage Identities page and clicking “add linked identity”, followed by “Add SSH Public Key”. If you do not have an ssh key, follow the directions here to create one.

  1. Use the Manage Identities interface at Globus.org to upload your ssh public key.
  2. Connect to Globus.org using an ssh client.
    $ ssh -l ${globus_username} cli.globusonline.org
  3. The Globus.org command-line interface can start and manage transfers, manage files on an endpoint, and configure endpoints associated with your account. Use the “help” command for more information on the commands available, or visit the Globus.org support system.

GridFTP

The Research Computing Globus datamover Collection is served by four physical servers. Globus automatically automizes transfers using these four servers.

GridFTP provides much of the underlying framework for Globus Connect and is part of the Globus Toolkit, but it can be used independently of Globus.org and Globus Connect.

Benchmarks indicate indicate that a fully-tuned manual GridFTP transfer (using globus-url-copy) is only marginally faster than the same transfer using Globus.org.  Both are significantly faster than SSH-based file transfer (e.g., scp, sftp).

  1. Bootstrap trust in the Research Computing MyProxy CA server (rc-dm1.its.unc.edu).
    $ export MYPROXY_SERVER_DN="${MYPROXY_SERVER_DN}/C=US/O=Globus Consortium/OU=Globus Connect Service/CN=fe94a350-5c03-11e5-b060-22000b92c6ec"
    $ myproxy-logon --trustroots --bootstrap --pshost rc-dm1.its.unc.edu --username ${Onyen_username}

    This exports a non-standard certificate Distinguished Name to your environment and downloads certificates to bootstrap trust in our CA server.

  2. Initiate a transfer using globus-url-copy, specifying a Research Computing GridFTP server DN with the -source-subject (-ss) argument.
    $ globus-url-copy -ss "/C=US/O=Globus Consortium/OU=Globus Connect Service/CN=fe94a350-5c03-11e5-b060-22000b92c6ec" gsiftp://rc-dm1.its.unc.edu:2811/${remote_path} file:///${local_path}

The above is only a minimal example, and would only open a connection to a single Research Computing GridFTP server rc-dm1. To learn more about using GridFTP, read the GridFTP User’s Guide.