Globus Connect file transfer

Globus and GridFTP

Globus augments in secure copy (scp/sftp) requests by automating reliable large data transfers, by resuming failed transfers, encrypts transfers and by simplifying the implementation of high-performance transfers between computing centers.

The UNC Globus Connnect endpoint is called “UNC, Research Computing, Datamover” endpoint (also called uncch#unc-rc-dm endpoint).

Accessible filesystems include but not limited to:   Home (/nas02/home), /proj, /netscr, Mass storage (~/ms, /ms/home, /ms/depts), /nas/depts, /nas[01|02|11], /datastore/nextgenout2.  Currently /lustre/scr, /pine/scr, and /21dayscratch are not accessible.

Globus Connect and Globus.org

Globus Connect is a Software as a Service (SaaS) deployment of the Globus Toolkit which provides end-users with a web browser interface to initiate fast, reliable data transfers between endpoints registered with the Globus Alliance.  Globus Connect allows registered users to transfer files from one endpoint to another endpoint.  Endpoints are terminals for data; they can be laptops or high-performance research clusters, and anything in between. The servers at Globus.org act as intermediaries-negotiating, monitoring and optimizing transfers.   Due to its ease of use, Globus Connect is recommended for individuals without extensive IT experience.

To learn more:  https://www.globus.org/

We highly recommend reading through Getting Started with Globus Connect initial setup.

 

Logging into Globus Connect

To use Globus Connect file transfer, go to the login web page: https://www.globus.org/SignIn#

You will use your UNC Onyen user-id and Onyen password.  From the SignIn page, enter and select “University of North Carolina at Chapel Hill” as the CILogon method.

For details on setup and using your UNC Onyen account we recommend following Getting Started with Globus Connect initial setup.

 

Transferring data to/from a local workstation

You can use Globus Connect to transfer data between your local workstation (e.g., your laptop or desktop) and Research Computing storage.  In this workflow, you configure your local workstation as a Globus endpoint using Globus Connect.

  1. Log in to Globus.org
  2. Use the Endpoints interface tab to add your personal Manage Endpoint by selecting “add Globus Connect Personal”.  For more information about adding a Mac, Windows, or Linux personal endpoint see Globus.org support.
  3. Use the Transfer Files interface tab to using your workstation personal endpoint for one side of the transfer, the “UNC, Research Computing, Datamover” endpoint (uncch#unc-rc-dm endpoint) for the other side.  You will be required to authenticate to the Research Computing endpoint using your Onyen account.

By default, File transfers are encrypted over the wire.  By default, “Transfer” means make a copy of the files from one endpoint to the other, existing files are overwritten.  Under the “more options” pulldown menu to the bottom-left of the “Label This Transfer” text box.

These are some options as a capability:

  • Only transfer new or changed files where the checksum is different
  • Delete files on destination that do not exist on source
  • Preserve source file modification times
  • Verify file integrity after transfer
  • Encrypt transfer (enabled at UNC’s unc-rc-dm endpoint by default even if box is not checked)
To setup your own personal endpoint use Globus Connect Personal.

Also found in the “Globus Connect Personal” section of the help doc Getting Started with Globus Connect initial setup.

In this example, the left is the UNC research computing datamover endpoint (uncch#unc-rc-dm).   On the right is the local desktop endpoint you setup in the above steps.

uncch-lookout

Transferring data between two research computing storage endpoints

Access the “UNC, Research Computing, Datamover” endpoint or uncch#unc-rc-dm endpoint as source and destination allows you to transfer data between various Research Computing storage such as /netscr and mass storage (~/ms/).  This method would be the same as using “ms” queue or a basic mv/cp task on the command-line on the research clusters.   The benefit is fast, reliable file transfer, and the resuming failed transfers.

  1. Log in to Globus.org
  2. Use the Transfer Files interface tab using the “UNC, Research Computing, Datamover” endpoint (uncch#unc-rc-dm) for one side of the transfer, and a 2nd “UNC, Research Computing, Datamover” endpoint (uncch#unc-rc-dm).  You will be required to authenticate to the Research Computing endpoint using your Onyen account.  The other remote endpoint may require its own authentication as well.  File transfers are encrypted over the wire by default.  Also by default, “Transfer” means make a copy of the files from one endpoint to the other, existing files are overwritten.  Under the “more options” pulldown menu to the bottom-left of the “Label This Transfer” text box. These are some options as a capability:
    • Only transfer new or changed files where the checksum is different
    • Delete files on destination that do not exist on source
    • Preserve source file modification times
    • Verify file integrity after transfer
    • Encrypt transfer (enabled at unc-rc-dm endpoint by default even if box is not checked)
  3. uncch-uncch

Transferring data between two remote endpoints

Globus.org can also be used to transfer data between two remote Globus endpoints (e.g., between another research compute center’s Globus endpoint and the Research Computing endpoint.)

  1. Log in to Globus.org
  2. Use the Transfer Files interface tab using the “UNC, Research Computing, Datamover” endpoint (uncch#unc-rc-dm) for one side of the transfer, and another endpoint of your choice for the other side.  You will be required to authenticate to the Research Computing endpoint using your Onyen account.  The other remote endpoint may require its own authentication as well.

Globus Connect command-line interface

Globus.org provides a command-line interface (CLI) as an alternative to its web interface.

To use the CLI you must have a Globus account with ssh access enabled. To enable your account for ssh access you must add your ssh public key to your Globus account by visiting the Manage Identities page and clicking “add linked identity”, followed by “Add SSH Public Key”. If you do not have an ssh key, follow the directions here to create one.

  1. Use the Manage Identities interface at Globus.org to upload your ssh public key.
  2. Connect to Globus.org using an ssh client.
    $ ssh -l ${globus_username} cli.globusonline.org
  3. The Globus.org command-line interface can start and manage transfers, manage files on an endpoint, and configure endpoints associated with your account. Use the “help” command for more information on the commands available, or visit the Globus.org support system.

GridFTP

The Research Computing Globus datamover endpoint (uncch#unc-rc-dm) is served by four physical servers. Globus automatically automizes transfers using these four servers.

GridFTP provides much of the underlying framework for Globus Connect and is part of the Globus Toolkit, but it can be used independently of Globus.org and Globus Connect.

Benchmarks indicate indicate that a fully-tuned manual GridFTP transfer (using globus-url-copy) is only marginally faster than the same transfer using Globus.org.  Both are significantly faster than SSH-based file transfer (e.g., scp, sftp).

  1. Bootstrap trust in the Research Computing MyProxy CA server (rc-dm1.its.unc.edu).
    $ export MYPROXY_SERVER_DN="${MYPROXY_SERVER_DN}/C=US/O=Globus Consortium/OU=Globus Connect Service/CN=fe94a350-5c03-11e5-b060-22000b92c6ec"
    $ myproxy-logon --trustroots --bootstrap --pshost rc-dm1.its.unc.edu --username ${Onyen_username}

    This exports a non-standard certificate Distinguished Name to your environment and downloads certificates to bootstrap trust in our CA server.

  2. Initiate a transfer using globus-url-copy, specifying a Research Computing GridFTP server DN with the -source-subject (-ss) argument.
    $ globus-url-copy -ss "/C=US/O=Globus Consortium/OU=Globus Connect Service/CN=fe94a350-5c03-11e5-b060-22000b92c6ec" gsiftp://rc-dm1.its.unc.edu:2811/${remote_path} file:///${local_path}

The above is only a minimal example, and would only open a connection to a single Research Computing GridFTP server rc-dm1. To learn more about using GridFTP, read the GridFTP User’s Guide.