The National Institute for Computational Sciences

Data Transfer

  Data Transfer

Introduction


The ACF provides several ways for transferring files to/from the NFS home directories, NFS project directories, Lustre project directories, and Lustre scratch directories. DTNs (Data Transfer Nodes) furnish this capability. At the time of this writing, there are four DTNs available to ACF users. Table 1.1 shows these nodes, in addition to pertinent information about them.

Table 1.1 - ACF Data Transfer Nodes
Data Transfer NodeIP Address
datamover1.acf.utk.edu192.249.6.163
datamover2.acf.utk.edu192.249.6.164
datamover3.acf.utk.edu192.249.6.165
datamover4.acf.utk.edu192.249.6.166

These DTNs are setup for NetID authentication, Duo TFA, and authentication through an InCommon Credential so users can login to this node and perform data transfer functions.

To connect to these DTNs, use ssh in a terminal. More information on ssh usage can be found in the Access and Login document. Replace the hostname of the login node with the hostname of the DTN to which you wish to connect, then authenticate with your UT NetID, password, and Duo TFA.

The ACF supports several data transfer protocols including SCP, SFTP, GSISCP, and Globus. SCP and SFTP are both ssh utilities available for transferring files but perform slower than Globus. At the time of this writing, Globus offers the fastest data transfers on the ACF. Still, SCP and SFTP are useful for small transfers. GSISCP is used for unattended data transfers.

Preparing Data


Before you initiate any data transfers from the ACF to another storage resource, consider preparing the data you wish to transfer by archiving and compressing it. When you archive data, several files and directories can be added to the same location. When you compress data, you reduce its total size. Both methods reduce the total amount of data that must be sent across the network and make it easier for you to organize the data you wish to transfer. At the time of this writing, the tar and zip utilities are the best methods for data archiving and compression for ACF users across Linux, MacOS, and Windows.

Using the tar Utility

The tar (tape archiver) utility uses simple command syntax and allows large amounts of data to be aggregated into the same archive. Linux, MacOS, and updated Windows 10 systems can use tar. Older Windows systems will be limited to the zip utility.

To create a tar archive, execute tar czvf <archive-name> <dir-to-archive>. Replace the <archive-name> argument with the name of the new archive. Be sure to follow the name with the .tar.gz extension, as in my_archive.tar.gz. Replace the <dir-to-archive> argument with the directory you wish to place within the archive. If the directory you intend to archive is not within your working directory, specify the relative or absolute path to it. By default, tar will recursively place the directory and its contents into the new archive. Figure 2.1 shows the successful creation of a tar archive.

[user@acf-login5 ~]$ tar czvf new_archive.tar.gz Documents
Documents/
Documents/IntroUnix.pdf
Documents/JobSubData.zip
Documents/MATLAB/
Documents/Scripts.zip
Documents/PyLists.py
Figure 2.1 - Creating a tar Archive

After the archive is created, execute ls -l to verify that the archive exists. You can view its contents with the tar tvf <archive-name> command. You may then transfer the archive using one of the data transfer methods described in this document. In general, Globus is the best method. Please refer to the Globus section to learn how to configure it for your system. On the remote system, execute tar xvf <archive-name> to extract the contents of the archive. The files will be extracted into your working directory.

Using the zip Utility

On older Windows systems, the zip utility should be used to archive and compress your data on the ACF.

To create a zip archive on the ACF, execute zip -r <archive-name>.zip <dir-to-archive>. Be sure that the directory you wish to archive is in your working directory. Otherwise, specify the relative or absolute path to the directory you wish to archive. Replace the <archive-name> argument with the name of the new zip archive. You may or may not include the .zip file extension to the archive’s name; if you do not, the zip utility will add it automatically. Replace the <dir-to-archive> argument with the directory you wish to place in the zip archive. The -r option ensures that the directory and its contents are archived and compressed. Figure 2.2 shows the successful creation of a zip archive.

[user@acf-login5 ~]$ zip -r Documents Documents
  adding: Documents/ (stored 0%)
  adding: Documents/IntroUnix.pdf (deflated 4%)
  adding: Documents/MATLAB/ (stored 0%)
  adding: Documents/PyLists.py (deflated 61%)
Figure 2.2 - Creating a zip Archive

After the zip archive has been created, execute ls -l in the directory from which you created it to ensure the archive exists. It will appear with the name you gave to the archive followed by the .zip extension.

With the zip archive created and verified, transfer it to your system using one of the data transfer methods described in this document. In most cases, Globus is the most convenient method. Please refer to the Globus section to learn how to configure it for use on your system. Once you transfer the zip archive to your system, open the File Explorer and navigate to the directory in which you placed the archive. Right-click on the archive and select the “Extract All…” option in the submenu. Figure 2.3 shows where to locate this option. Specify the directory in which the contents should be extracted, then select “Extract.” You may then open the archive and peruse its contents.

Figure 2.3 - Extracting the Contents of a zip Archive in Windows

SCP and SFTP


SCP and SFTP are both ssh utilities available for transferring files on the ACF. However, they perform slower than Globus. At the time of this writing, Globus offers the fastest data transfers on the ACF. Still, SCP and SFTP are useful for quick, small transfers. For larger file transfers, please use Globus.

SCP and SFTP are available to Linux and MacOS systems by default. Windows 10 users with the most recent updates can use these utilities within Command Prompt or PowerShell. Windows 7 and 8 users must use a third-party utility to use SCP and SFTP. For more information on ssh in Windows, see the Access and Login document. For Windows 7 and 8 users, the third-party utilities FileZilla and WinSCP are reviewed later in this document.

The general syntax of SCP is given below. In general, SCP is useful when transferring a file on your system to the ACF. The <source> argument is the pathname of the file on your system that you wish to copy. The <destination> (in this case, datamover1) argument is the hostname of the datamover you wish to use. Additionally, the <directory> argument specifies the absolute pathname within the destination to place the file.

scp <source> <NetID>@datamover1.acf.utk.edu:<directory>

If you wanted to copy a file from your system and place it on the ACF, you could use scp ~/<filename><NetID>@acf-login.acf.utk.edu:~/Documents.

For SFTP, you specify the hostname of the system to which you intend to connect. For example, to securely transfer files between your local system and the ACF, use the syntax below in a terminal on your local system. Ensure that you enter SFTP from the directory that contains the file(s) you wish to copy to the ACF. You can use the pwd command to determine your current directory before entering SFTP.

sftp <NetID>@datamover1.acf.utk.edu

Once you authenticate with your UT NetID, password, and Duo TFA, you will enter SFTP’s interactive mode. Use the put <file> command to upload a file to the ACF. For example, to upload a file named JobScript.sh to the ACF from your local machine, use put JobScript.sh. This syntax assumes that the JobScript.sh file is in the directory from which you entered SFTP.

To retrieve files from the ACF, use the get <file> command. To download a file named ResearchResults.txt from the ACF to your local machine, use get ResearchResults.txt. SFTP will place the file in the directory from which you entered the utility. To change directories on the ACF, use the cd <directory> command. Use the lcd <directory> command to change the directory on your local system. Once you are done with SFTP, use the bye or exit commands to exit it. Other commands are available with the SFTP utility. Type help within SFTP to read more about them.

Globus Web-based Transfers


The Globus web interface allows you to conveniently perform data transfers to and from ACF resources. At the time of this writing, Globus is the fastest and most efficient data transfer method available on the ACF. Before you can use Globus, you must create an InCommon Credential and associate it with your account. To perform this task, please consult the User Portal documentation. After you associate your InCommon Credential with your account, continue through this document to use Globus. You should also review the official Globus documentation for more information on how to use the tool.

Credential information is updated on the datamovers every hour. If you are unable to use Globus, please wait approximately an hour for the datamovers to obtain your information. Generally, if you see your InCommon credential appear in the User Portal, you should be able to use Globus.

Using the Globus Web Interface

To access the Globus interface in your browser, navigate to the Globus website. Login using the existing organizational login option. Verify that the University of Tennessee is selected, then select “Continue.” Authenticate with your UT NetID, password, and Duo TFA. You will then see the interface depicted in Figure 3.1. If you experience issues logging in, verify that your InCommon credential was configured per the steps given in the Configuring your InCommon Credential section.

Globus Main Interface
Figure 3.1 - Initial Globus Interface

Before you can initiate file transfers between your local machine and the ACF, you must configure endpoints. One endpoint will reference your local system while the other will reference one of the ACF DTNs. Further instructions on these endpoints will be provided below.

To configure the endpoints in the Globus interface, select the “Endpoints” tab on the left-side of the page. You will then see a page similar to Figure 3.2. At the top-right of the page, select “Create new endpoint.” On the endpoint type selection page, choose “Globus Connect Personal.”

Globus Endpoint Selection
Figure 3.2 - Globus Endpoint Menu

On the next page, name the endpoint. The name you choose is unimportant; however, it should be something memorable. After you name the endpoint, generate a setup key for the Globus Connect Personal client software. The option to generate the key is listed under Step 2 in Figure 3.3. Copy this key. Finally, download and install the Globus Connect Personal client software. When prompted, enter the setup key you copied to configure your local machine as an endpoint. Refer to Figure 3.3 for a screenshot of the endpoint creation page.

Globus Endpoint Selection
Figure 3.3 - Globus Endpoint Creation Menu

Once you configure your local machine as a Globus endpoint, return to the “File Manager” tab on the left-side of the page. Make sure you select the double panels option in the top-right of the page (Figure 3.9 highlights this option). This will display your local machine’s filesystem in addition to the datamover’s. Once both panels are displayed, click on “Collection” in the left panel. Type the name of your endpoint in the search bar or find it under “My Collections."

After your endpoint has been selected, you will return to the File Manager. In the right panel, click on “Collection.” Search for one of the four ACF datamovers. The hostnames of these DTNs are given below.

  • nics#datamover1
  • nics#datamover2
  • nics#datamover3
  • nics#datamover4

Once both endpoints are configured, you can transfer data between the two. You can select individual files and directories for these transfers. When you select the data you wish to transfer, press the “Start” button below the endpoint from which you will transfer data. Additionally, you can navigate throughout the filesystem hierarchy in either endpoint using the Globus interface. Other options are available for your transfers, but they are usually unnecessary for most transfers. Figure 3.4 shows what the Globus interface should look like when both endpoints are selected.

Globus Endpoints Selected
Figure 3.4 - Globus File Transfer Interface

Unattended Transfers with gsissh


Unattended data transfers are possible on the ACF with gsissh and its associated tools. You may find these transfers useful in situations where massive amounts of data must be transferred overnight or a job requires input or output data without your intervention. To use gsissh, follow the steps in the User Portal document to create and associate an InCommon credential with your account, then navigate to the CILogon website and select the University of Tennessee as your identity provider. After you authenticate to the UT CAS, follow these steps to supply a password for your credential and upload it to the ACF.

  1. Expand the “Create Password-Protected Certificate” menu.

  2. Enter a password for your new InCommon credential. It is critical that you remember this password. Please record and store this password in a secure location for future reference.

    Figure 4.1 - Setting a Password for the Credential

  3. Download your credential. Click download, then save it to your local system. This will act as a backup copy of the credential. Figure 4.2 highlights the download link.

    Figure 4.2 - Downloading the Credential

  4. Right-click on the link to “Download Your Certificate” and copy the URL to your clipboard.

  5. Make a new directory in your home directory with the name .globus. Use mkdir ~/.globus to create the directory. Execute cd .globus to enter this directory.

  6. On the ACF, type wget <copied-url> where <copied-url> is the download link to your InCommon Credential. You should be able to paste the link with the key combination Ctrl + V on Windows and Linux systems or Command + V on MacOS systems. Verify that you are in the .globus directory with pwd before you execute wget. Figure 4.3 shows the output of a successful wget download.

    [user-x@acf-login8 .globus]$ wget https://polo1.cilogon.org/pkcs12/36E8DB8887F08ACC5/usercred.p12
    --2020-01-16 16:08:58--  https://polo1.cilogon.org/pkcs12/36E8DB8887F08ACC5/usercred.p12
    Resolving polo1.cilogon.org (polo1.cilogon.org)... 141.142.149.19
    Connecting to polo1.cilogon.org (polo1.cilogon.org)|141.142.149.19|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 2837 (2.8K) [application/x-pkcs12]
    Saving to: ‘usercred.p12’
    
    100%[==========================================>] 2,837       --.-K/s   in 0s      
    
    2020-01-16 16:08:58 (156 MB/s) - ‘usercred.p12’ saved [2837/2837]
    Figure 4.3 - Using wget to Retrieve your Credential

  7. Execute chmod 600 usercred.p12 to change the permissions of the InCommon Credential file. Execute ls -l on the .globus directory to verify that the usercred.p12 file has read and write permissions for the user owner. Figure 4.4 shows the permissions that should apply to the credential file.

    [user-x@acf-login8 .globus]$ ls -l
    total 4
    -rw-------. 1 user-x testgrp 2837 Jan 16 16:08 usercred.p12
    Figure 4.4 - Permissions to Set on the usercred.p12 File

After you have successfully downloaded and modified the usercred.p12 file, execute module load globus to load the Globus modulefile into your environment. Use module list to verify that Globus was successfully loaded. Next, execute grid-proxy-init to use the InCommon Credential for authentication. Enter the password you set in the first step of the credential configuration process. Figure 4.5 shows what should appear if the certificate initialization was successful.

[user-x@acf-login8 ~]$ grid-proxy-init
Enter GRID pass phrase for this identity:
Your identity: /DC=org/DC=cilogon/C=US/O=University of Tennessee/CN=User X A12345678
Creating proxy ...................... Done
Your proxy is valid until: Fri Jan 17 04:30:39 2020
Figure 4.5 - Using grid-proxy-init to Initialize a Credential

In some cases, you may require the credential to be valid for an extended period. Execute grid-proxy-init -valid hh:mm to specify the amount of time the credential should remain valid. For example, to make the credential last for three days, execute grid-proxy-init -valid 72:00. If successful, the output of the certificate initialization will show the date when the certificate will expire. In Figure 4.5, the certificate expires twelve hours from the time it was initialized. This is the default expiration timer.

Once your credential is initialized, gsiscp will work without prompting you for a password or Duo TFA. You will receive a password prompt, but gsissh will automatically override the prompt to allow the transfer. The syntax for gsiscp is gsiscp <source> <destination:directory>. For instance, to transfer a gzipped tar archive named MedJobResults.tar.gz in your ACF home directory to a remote system, execute gsiscp ~/MedJobResults.tar.gz remotestorage.local:~/Documents. To transfer a zip archive named fluid_dynamics_results.zip from your home directory on the ACF to a remote datamover, execute gsiscp ~/fluid_dynamics_results.zip remote_datamover1:~/

Using FileZilla to Transfer Files


FileZilla will work with file transfers to the ACF. Please only use the DTNs listed in Table 1.1 at the beginning of this document.

To use the FileZilla client with your NetID, password, and Duo TFA, follow these steps.

  1. Open the FileZilla client.

  2. Select File, then Site Manager.

    FileZilla Site Manager
    Figure 5.1 - FileZilla's Site Manager Option

  3. Select “New Site,” then provide the necessary information. For the host, select one of the datamovers listed in Table 1.1. For protocol, select SFTP - SSH File Transfer Protocol. For Logon Type, select Interactive. For User, type your UT NetID. Finally, rename the entry under sites from "New Site" to something more memorable, such as the name of the datamover you chose to use. Refer to Figures 5.2 and 5.3 to identify where to find these options.

    New Site in FileZilla
    Figure 5.2 - New Site in FileZilla

    FileZilla Site Options
    Figure 5.3 - FileZilla Site Options

  4. Select Transfer Settings, then check the box for Limit the number of simultaneous connections. Make sure the value beneath this checkbox is 1.

  5. Select “Connect” in the Site Manager window.

  6. When prompted, enter your password.

    FileZilla Password Prompt
    Figure 5.4 - FileZilla Password Prompt

  7. When prompted, type a “1” to send a Duo Push to your mobile device, then authenticate with Duo TFA. Upon successful authentication, you will be logged in to the datamover through FileZilla.

    FileZilla Duo Prompt
    Figure 5.5 - FileZilla Duo Prompt

Using WinSCP to Transfer Files


WinSCP can perform file transfers to and from the ACF. Please use the DTNs listed in Table 1.1 at the beginning of this document.

To use the WinSCP client with your NetID, password, and Duo TFA, follow these steps.

  1. Open WinSCP, then click on “New Site.”

  2. Provide the hostname of the datamover for “Host name,” your UT NetID for “User name,” and your password. Leave the port number as 22.

    WinSCP New Site Creation
    Figure 6.1 - WinSCP New Site Creation

  3. When warned about an unknown server, select “Yes.”

    WinSCP Key Warning
    Figure 6.2 - Initial WinSCP Key Warning

  4. The authentication banner will appear. Select “Continue.”

    WinSCP Authentication Banner
    Figure 6.3- WinSCP Authentication Banner

  5. When prompted, type “1” to receive a Duo Push on your mobile device. Authenticate with Duo. You will then be logged in.

    Duo Prompt in WinSCP
    Figure 6.4 - Duo Prompt in WinSCP

  6. Once you authenticate, you will get the WinSCP application screen. On the left side of the screen, you see your local machine. On the right side of the screen, you see the remote system into which you are logged.


  7. Return to Top


    Last Updated: 02 / 12 / 2020