Pittsburgh Supercomputing Center 

Advancing the state-of-the-art in high-performance computing,
communications and data analytics.

 Data Supercell

The Data Supercell

The Data Supercell (DSC) is a PSC-designed and built system for managing and archiving petabyte-scale data for researchers and industrial users. The DSC provides low-latency, high-capacity, high-reliability, high-bandwidth and low-cost data storage and retrieval.  The Data Supercell has an initial capacity of 4 Petabytes, but its architecture will enable it to scale well beyond this initial deployment. The Data Supercell storage system is managed by the SLASH2 file system software, developed at the PSC. This document explains how to store and access your data on the Data Supercell.

Because the Data Supercell is a completely disk-based system it can support data resource applications well beyond those of traditional tape-based data storage resources. Traditional archivers were designed to be used for data retrieval to support batch processing. The fact that the Data Supercell is disk-based means it offers very high transfer rates for your data. Because it is disk-based it is also a very flexible system. It can accommodate datasets of different sizes, including extremely large datasets. Its flexibility also means that interface and access methods for the Data Supercell can be customized to meet your needs. The Data Supercell, because it is disk-based, has been constructed to be highly reliable and highly secure.

The Data Supercell is well-suited for data storage needs well beyond those met by traditional archivers. For more information about the advantages of the Data Supercell over conventional archival systems, please see the Data Supercell features page. If you want to discuss whether the Data Supercell could meet your data storage needs or to discuss what special arrangements we could make for your data storage application send email to remarks@psc.edu.

The Data Supercell is strictly a file storage system. It is not used for computing nor will you, in a running program, directly open files that reside on it. Moreover, you will not login to the Data Supercell to perform file transfers. You will access your archived files indirectly using remote file transfer software and services on other systems.

Scheduled Maintenance

When necessary, downtime for system maintenance will be taken Wednesdays between 8:30 am and 5:00 pm Eastern time.

Users will be notified the day before a scheduled outage. If you find that a scheduled downtime will be disruptive to your work, please contact remarks@psc.edu with as much lead time as possible and we will try to accommodate your needs.

 

Applying For A Storage Allocation and Other Policies

If you are an XSEDE user there are storage allocation policies that apply to you. These policies are described immediately below. If you are a non-XSEDE, non-commercial user, similar policies apply to you. For precise information on the policies that apply to you send email to remarks@psc.edu. If you are a commercial user send email to corp-relations@psc.edu for information on the storage allocation policies that apply to you.

If you are an XSEDE user, when you apply for an XSEDE computing allocation, you must also apply for a Data Supercell allocation. If you are an XSEDE user, you cannot get a Data Supercell allocation without also applying for a computing allocation on one of PSC's production resources. To apply for an allocation of space on the Data Supercell you use the XSEDE User Portal, which is the same mechanism you use to apply for computing allocations.

Your storage allocation is intimately tied to your associated grant. Each file you create while using your grant is assigned to that grant. Every one of your files must be assigned to one of your active grants.

Three months after your grant expires all files on the Data Supercell that have been assigned to that grant will be deleted. This deletion process will occur even if you have other active grants. The deletion process is based on grant not on individual user. Thus, when you are creating files you must insure that they have been assigned to the proper grant to guarantee that files you want to retain are not deleted during this three-month purging process. How to insure that your files are assigned to the correct grant is discussed below.

When you apply for a storage allocation, as part of your XSEDE grant application process, you request an amount of Data Supercell space. This amount must cover not only new files you might create while using your grant, but also existing files on your grant,if you are applying for a Renewal, or existing files you intend to move from another grant. Thus, the time when you apply for a new storage allocation is a good time to remove unneeded Data Supercell files.

You must request at least 100 Gbytes of space, while the maximum amount of space you can request per grant is 40 Tbytes. If you need more space than 40 Tbytes, you must make special arrangements by contacting remarks@psc.edu. These limits apply whether you are applying for a Research or Renewal grant.

When you apply for a Startup grant you do not apply for a Data Supercell allocation. Each Startup grant is given a Data Supercell quota of 100 Gbytes. If you are a Startup user and anticipate that you will need more than 100 Gbytes of storage you should send email to remarks@psc.edu with your space request and a justification for your increased amount. Startup users will be held to the 40 Tbyte maximum that applies to Research and Renewal users.

This allocation amount is the quota for the files you can store under this grant on the Data Supercell. You cannot exceed this quota. This quota is a grant quota. Even though files are stored in directories by individual, your Data Supercell quota applies across all the individuals on your grant. There are no individual Data Supercell quotas. In your grant proposal, just as you must justify your compute allocation, you must also justify your storage allocation, if you ask for more than 100 Gbytes of storage. However, there are no maximum file sizes or maximum number of files you can store.

 

Using grace.psc.edu to Manage Your Data Supercell Storage

 

If your normal access to the Data Supercell has been blocked, either because your account has expired or because you are over your Data Supercell quota, you can use the grace.psc.edu system to access your data. Grace.psc.edu is a special machine the PSC has created to allow you to access your data if your normal access to the Data Supercell has been blocked.

You login to grace.psc.edu using your PSC userid and password and are placed in your Data Supercell home directory. You can examine your file's filenames or contents using the standard Unix commands to help you decide which files can be deleted or transferred. However, you cannot write new files to the Data Supercell while on grace.psc.edu.

Your access to grace.psc.edu lasts for 3 months after your account expires. If you are using grace.psc.edu because you are over your quota you should send email to remarks@psc.edu once you are under your quota. If you are using grace.psc.edu becase your account has expired you have 3 months to determine the disposition of your files. You can delete them, transfer them to another machine or move them to an active PSC grant.

More information about the use of grace.psc.edu is available in the immediately following section "Managing Your Storage Allocation."

 

Managing Your Storage Allocation

All Data Supercell users must manage their storage allocations, both in terms of staying under your Data Supercell quota and handling your files when your Data Supercell allocation ends. The procedures that non-XSEDE users must follow are described in the next paragraph. The procedures that XSEDE users must follow are described in the paragraphs following that paragraph.

If you are a non-XSEDE user of the Data Supercell you can monitor your Data Supercell usage by logging into the machine grace.psc.eu and issuing the command

    du -sk

to see what your total Data Supercell usage is. If you are over your quota send email to corp-relations@psc.edu or remarks@psc.edu to discuss increasing your Data Supercell quota. When logged into grace.psc.edu you can use the usual Unix commands to examine and delete your Data Supercell files. To login to grace.psc.edu you use your PSC userid and PSC password. You are given your PSC userid when you obtain your Data Supercell allocation. You can set a PSC password with form apr.psc.edu. When your allocation is nearing its end send email to corp-relations@psc.edu or remarks@psc.edu to discuss either extending your allocation or how to handle the transition of your Data Supercell files.

If you are an XSEDE user of the Data Supercell, you must keep your Data Supercell storage usage under your quota for each grant. Otherwise you are in jeopardy of having your write access to the Data Supercell turned off. If you want to increase your storage quota you can apply to increase your quota by asking for a Data Supercell Supplement through the POPS system accessible through the XSEDE Portal. To request a Data Supercell Supplement go to the Submission Home page in POPS and follow the link for a Supplemental action. You can contact the PSC at remarks@psc.edu if you want to make other arrangements to get extra space. You can, of course, also delete unneeded files or move them to another location to stay under your quota.

There are several tools you can use to monitor your Data Supercell storage usage. The xbanner command, which is available on PSC production systems, shows your Data Supercell quota and also the total Data Supercell usage across all users on your grant. The Grant Management System also shows your Data Supercell quota and total Data Supercell usage across all users on your grant. However, even though quotas are grant quotas and not individual quotas, if you are the PI, co-PI or a user who has been designated by the grant's PI to see account information, you can use the Grant Management System to see Data Supercell usage for each individual user on a grant, as well as the total usage by all users on a grant. Other users can only see their individual usage and the total usage across the grant. Both of these tools will show quotas and usage for all your active grants.

The Portal login dashboard displays your accounting data for all of your XSEDE resources, including the Data Supercell. It shows individual usage as well as total usage for a grant.

Our recommended method for copying files into the Data Supercell is to use Globus Online. If you only have one active grant, files you copy in with Globus Online will be assigned to that grant. If you have more than one active grant your files will be assigned to your default group. The first group listed by the groups command on any PSC production platform is your default group on the Data Supercell. Other file tranfers methods will work similarly with respect to group assignment.

The PSC uses the Unix group as a charge-id. Moreover, there is a one-to-one correspondence between charge-id and grant number. To see which grant number corresponds to a charge-id you can use xbanner or the Grant Management System. To determine to which charge-id a file has been assigned you can cd to your Data Supercell home directory on a PSC production machine and use the ls -l command. Your Data Supercell home directory is /arc/users/joeuser, where you substitute your PSC userid for 'joeuser'. The output from these commands can be used to determine to which grant a Data Supercell file has been assigned.

If your files have been assigned to an incorrect grant you should take steps to assign them to the correct grant. If you do not do this you could inadvertently exceed a grant quota or, more ominously, expose your files to being deleted if they are assigned to a grant that expires. To change the charge-id for a file use the chgrp command. There is a man page for chgrp. As was mentioned, at PSC group is synonymous with charge-id. The chgrp command can be used to change the charge-id for a single file, for a group of files using wildcards, or for all the files in a directory.

For example, suppose you want to change the charge-id for all your files in the directory mynewdata, which is a directory you created off your Data Supercell home directory. First, you would cd to your Data Supercell home directory. Then you would issue the command

   chgrp -R mc3ts7p mynewdata

This command will assign all the files in mynewdata to charge-id mc3ts7p.

If you exceed your Data Supercell quota your normal methods of access to the Data Supercell will be blocked. You can either use the Portal to increase your quota and then send email to remarks@psc.edu to have your normal methods of access restored or you can login to grace.psc.edu with your PSC userid and password. On this machine you will be able using the standard Unix commands to examine your Data Supercell files and delete or transfer any files that you no longer need. However, you will not be able to write any new Data Supercell files. The only methods of file transfer available for grace.psc.edu are scp, sftp and rsync, all of which are described below. The command

    du -sk

will show you your Data Supercell usage. You can use the xbanner command or the Grant Management System to see your Data Supercell usage. Once you are under your quota send email to remarks@psc.edu and your normal access to the Data Supercell will be restored.

The second issue you confront when managing your Data Supercell storage allocation is what to do to keep wanted files from being deleted when your grant expires. As was mentioned, when a grant expires, the files stored on that grant are deleted three months after the grant expires. There are two cases. The first case is if there is at most a three-month time overlap between your expired grant and your new grant. If your new grant is a Renewal and there is an overlap, you do not need to take any steps to insure that your files are assigned to the correct grant when your Renewal becomes active. They will not be in jeopardy of being deleted. If your new grant is not a Renewal and there is an overlap, then any files you want to maintain on the Data Supercell after your old grant expires must be assigned to your new grant.

This applies even if you are moving from a Startup to a Research grant. A Research grant is a completely new grant compared to your Startup grant. Any Startup files you want to maintain must be explicitly moved to your Research grant. Any Startup files you do not need on your Research grant you should delete. If you are moving from an expiring Research grant to another Research grant, you may not need to transfer many files between the grants. Any files that you do not want to transfer between grants you should move to another location or delete.

To move files beween grants you must use the method discussed above to make sure your files are assigned to the proper grant. The first step in this process is to find the correspondence between your grant number and your PSC charge-id. At the PSC, your Unix group and your charge-id are the same. To see which grant number corresponds to a charge-id you can use the Grant Management System or the xbanner command on a PSC production system. After using either of these tools you should know the charge-id of your expiring grant and the charge-id of the grant to which you need to assign your files that are currently assigned to your expiring grant.

Next you need to determine to which charge-id your files have been assigned. To do this, login to a PSC production system, such as blacklight, and cd to your Data Supercell home directory. Your Data Supercell home directory is /arc/users/joeuser, where you substitute your PSC userid for 'joeuser'. Once you are at your home directory the command ls -l will show you to which charge-id each of your files has been assigned.

Now you need to change the charge-id of the files that are assigned to your expiring grant. The chgrp command is the command you need to use. The chgrp command can be used to change the charge-id for a single file, for a group of files using wild cards or for all the files in a directory.

For example, suppose you want to change the charge-id for all your files in the directory mynewdata, which is a directory you created off your Data Supercell home directory. First, you would cd to your Data Supercell home directory on a PSC production platform. Then you would issue the command

    chgrp -R mc3ts7p mynewdata

This command will assign all the files in mynewdata to charge-id mc3ts7p. There is a man page for chgrp if you want more information.

The second case of an expiring grant is if there is not a three-month time overlap between your old and your new grant. This is also the situation that applies if you are not applying for a new PSC grant. In this scenario your Data Supercell files will be deleted when the three-month time limit is reached. You should therefore move any Data Supercell files you want to maintain to another location. If you have been actively monitoring your usage during the course of your grant this process will be less daunting. Any files you do not want to save you should delete. As has been mentioned, if your XSEDE grant expires, your normal methods of access to the Data Supercell will be blocked, but you can use the machine grace.psc.edu, as was described above, to delete or transfer your Data Supercell files.

Transferring Files

A variety of transfer methods are available to copy files to and from the Data Supercell. These methods are discussed below

Our recommended method of file transfer is Globus Online, if you can obtain credentials to use Globus Online, either through XSEDE or InCommon. If you cannot use Globus Online, but do have access to Globus software, our next recommended method is globus-url-copy. Otherwise, you should use either the tar method discussed below or scp. You should use the former method if you are transferring a lot of small files.

PSC maintains a Web page that lists average data transfer rates between all XSEDE resources, including the Data Supercell. For example, you can find the average data transfer rate between blacklight and the Data Supercell in this table. If your transfer rates are significantly below these average rates or you believe that your file transfer performance is subpar, send email to remarks@psc.edu and we will examine methods of improving your file transfer performance.

Your home directory on the Data Supercell is /arc/users/joeuser, where 'joeuser' is your PSC userid. You will usually need to know this path when transferring files to and from the Data Supercell.

If you are going to store a file that is 2 Tbytes or larger or you intend to store more than 500 Gbytes in a single day, send email to remarks so that special arrangements can be made to handle your large file transfers.

Globus Online

Globus Online is our recommended method of transferring data to and from the Data Supercell. To use Globus Online you must first create a Globus Online userid and password at the Globus Online Web site. Once you have logged in to Globus Online you can initiate your file transfer. For each transfer you must select two Globus Online endpoints, to which you must then authenticate. The endpoint to use for the Data Supercell is xsede#psc.data. If you are an XSEDE user you can use your XSEDE User Portal userid and password to authenticate to the Globus Online endpoint xsede#pscdata. When connecting to the xsede#pscdata endpoint on Globus Online you may be redirected to the XSEDE OAuth page to enter your XSEDE User Portal username and password for authentication. After authentication, you will automatically be returned to the Globus Online site to initiate your transfers.

If you are affiliated with an InCommon institution you can use your userid and password for that institution to authenticate to endpoint psc#dsc-cilogon if you have previously registered with PSC as an InCommon user. To register with PSC as an InCommon user you must follow the steps below. You only need to follow these steps once.

  1. Point your browser to https://cilogon.org/, select your institution from the 'Select an Identity Provider' list, and click on the 'Log On' button. This should take you automatically to a login page for your institution.
  2. Enter your username and password for your institution on the login page, and click on the 'Login' button. You should then get redirected back to the CILogon Service webpage.
  3. In a box near the top of the CILogon Service webpage there should appear a field called "Certificate Subject" with a string like
    /DC=org/DC=cilogon/C=US/O=My Institution/CN=My Name A1234
    Copy this string to a clipboard or text file. You'll need to paste it into a field on another webpage shortly.
  4. Log Off from the CILogin Service webpage.
  5. Point your browser to
    https://dirs.psc.edu/cgi-bin/teragrid/userpage/list.pl
    and login with your PSC username and password. This site lists the certificate subjects (DNs) that we have in our PSC database for your PSC account. You will be adding the CILogon certficate subject (DN) to this list.
  6. Click on the 'Add DN' link at the top left. This should get you to the "Adding DN" page, where you paste into the DN: field the certificate subject that you copied in step 3 above. Make sure there are no extra spaces before or after the pasted string. Click on 'Create' to add your new CILogon DN (certificate subject) to the PSC database.
  7. Click on the 'List DNs' link at the top left to confirm that your new DN was added.

Within an hour you should be able to use Globus Online by authenticating to the psc#dsc-cilogon endpoint using your userid and password at your institution. You should only need to perform this authentication once per Globus Online session or for as long as the retrieved credentials are valid. Send email to remarks@psc.edu if you having any questions about using InCommon.

If you are unable to use either the XSEDE or InCommon methods of authentication to Globus Online send email to remarks@psc.edu to see if you can use other methods of authentication to Globus Online.

If you do not enter a path for the endpoints xsede#pscdata or psc#dsc-cilogon your destination will be your Data Supercell home directory. You can enter a path if you want a different destination on the Data Supercell.

You can also use Globus Online to transfer files between your blacklight brashear space and the Data Supercell. You would use xsede#pscdata for both endpoints, since xsede#pscdata can also be used to point to blacklight. To make your second endpoint point to your brashear files you must enter as your Globus Online path for this endpoint the complete path to your brashear directory. Otherwise, Globus Online will use your Data Supercell home directory as your path in the transfer. For example, suppose you want to transfer brashear file largematrix.dat to your Data Supercell home directory. You would enter xsede#pscdata for both endpoints. For the path for the endpoint you are using to point to blacklight you must enter your path as /brashear/joeuser/largematrix.dat. For the other path, the path pointing to the Data Supercell, you can enter just largematrix.dat, since you are storing the file in your home directory on the Data Supercell. Then you can initiate the transfer by clicking the appropriate arrow button.

Finally, you can use Globus Online to transfer your files from the Data Supercell through the grace.psc.edu machine. The endpoint to use grace.psc.edu is psc#grace. The default directory if you use this endpoint is your Data Supercell home directory. Since you will only use this endpoint when your Data Supercell account is closed, you will only be able to use this endpoint to transfer files out of the Data Supercell, not into it. Your other endpoint, the endpoint to which you are transferring files, can be any Globus Online endpoint that points to a machine on which you have a valid account.

Sftp and scp

You can transfer files between your local systems and the Data Supercell using the SSH file transfer clients sftp and scp. When using sftp and scp to transfer files to and from the Data Supercell you do not connect directly to the Data Supercell. You transfer your files using a PSC high-speed data conduit named data.psc.xsede.org. You transfer files to and from the Data Supercell via data.psc.xsede.org. If you are not connecting to the data conduit from an XSEDE host you must use the name data.psc.edu for the data conduit. If you have a graphical sftp or scp client application on your local system, you can use it to connect and authenticate to data.psc.xsede.org and transfer files accordingly. Use your PSC userid and password for authentication.

If you need to (re)set your PSC password, you can do so via the kpasswd command on any PSC production system, or using the http://apr.psc.edu/ Web form.

You can use the command-line sftp client to transfer files to and from the Data Supercell interactively. When using sftp from the command line, you first connect and authenticate to data.psc.xsede.org, and then issue commands at the sftp> prompt to transfer and manage files:

$ sftp joeuser@data.psc.xsede.org

where joeuser is your PSC userid. The first time you connect to data.psc.xsede.org using sftp or scp, you may be prompted to accept the server's host key. Enter yes to accept the host key:

The authenticity of host 'data.psc.xsede.org (128.182.70.103)' can't be established.
RSA key fingerprint is d5:77:f2:d9:07:f6:32:b6:c3:eb:0d:d1:29:ed:9b:80.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'data.psc.xsede.org' (RSA) to the list of known hosts.

You will then be prompted to enter your PSC password:

joeuser@data.psc.xsede.org's password:
Connected to data.psc.xsede.org.
sftp>

At the sftp> prompt, you can then enter sftp commands (e.g., pwd, ls, put, get, etc.) to manage and transfer your files to/from the Data Supercell. Enter a question mark for a list of available sftp commands.

Examples (where joeuser is the user's PSC userid, and entered commands appear in bold):

  • Where am I?
    sftp> pwd
    Remote working directory: /arc/users/joeuser
  • Where am I on my local system?
    sftp> lpwd
    Local working directory: /Users/JoeUser/Documents
  • Change directories on my local system to /usr/local/projects:
    sftp> lcd /usr/local/projects
  • Make a new directory called "newdata" under my current directory on the Data Supercell :
    sftp> mkdir newdata
  • Copy a file (file1.dat) from my current local directory to my newdata subdirectory on the Data Supercell:
    sftp> put file1.dat newdata/file1.dat
    Uploading file1.dat to /arc/users/joeuser/newdata/file1.dat
    file1.dat          100% 1016KB   1.0MB/s   00:00
  • Copy a file on the Data Supercell (/arc/users/joeuser/file1) to /usr/local/projects/newfile1 on my local system :
    sftp> get /arc/users/joeuser/file1 /usr/local/projects/newfile1
    Fetching /arc/users/joeuser/file1 to /usr/local/projects/newfile1
    /arc/users/joeuser/file1          100%   31     0.0KB/s   00:00
  • Exit from this sftp session :
    sftp> exit

    or

    sftp> bye

For scripted transfers, or transfers that you want to execute directly from your command-line shell, you can use the SSH scp client:

Examples (where joeuser is the user's PSC userid, and entered commands appear in bold, and the user enters their PSC password when prompted):

  • Copy my local file (/usr/local/projects/file1.dat) to my home directory on the Data Supercell:
    $ scp /usr/local/projects/file1.dat joeuser@data.psc.xsede.org:.
    joeuser@data.psc.xsede.org's password: 
    file1.dat          100% 1016KB   1.0MB/s   00:00  
    
  • Copy the contents of my newdata directory on the Data Supercell to directory /usr/local/projects on my local system (creating /usr/local/projects/newdata and copying all the files from newdata):
    $ scp -r joeuser@data.psc.xsede.org:newdata /usr/local/projects
    joeuser@data.psc.xsede.org's password: 
    file2.dat          100% 1016KB   1.0MB/s   00:00
    file3.dat          100% 1016KB   1.0MB/s   00:01
    file1.dat          100% 1016KB   1.0MB/s   00:00
    

Rsync

You can use the rysnc command to keep the contents of a local directory synchronized with a directory on the Data Supercell. A sample rsync command is

rsync -rltpDvP -e 'ssh -l joeuser' source_directory data.psc.xsede.org:target_directory

where 'joeuser' is your PSC userid and 'source_directory' is the name of your local directory and 'target_directory' is the name of the directory on the Data Supercell. If you are not issuing the command on an XSEDE host you must use data.psc.edu as the address in your rsync command. We recommend the rsync command options -rltpDvP. See the rsync man page for information on these options and on other rsync options you might want to use.

We have several other recommendations for the use of rsync. First, we recommend that you install the HPN-SSH patches to improve the performance of rsync. These patches are available online.

If you install the HPN-SSH options you can use the ssh options

-oNoneSwitch=yes
-oNoneEnabled=yes

in the -e option of your rsync command for faster data transfers. With the use of this option your authentication is encrypted but your data transfer is not. If you want encrypted data transfers you should not use this option.

Finally, whether or not you install the HPN-SSH patches we recommend the option

-oMACS=umac-64@openssh.com

If you use this option your transfer will use a faster data validation algorithm.

A convenient way to use these options is to define a variable whose value is the options you want to use. An example command is to do this is

setenv SSH_OPTS '-oMACS=umac-64@openssh.com -oNoneSwitch=yes -oNoneEnabled=yes'

You can then issue your rsync command as

rsync -rltpDvP -e 'ssh -l joeuser $SSH_OPTS' source_directory data.psc.xsede.org:target_directory

The above recommendations, excluding the recommenation for the rsync options, are also appropriate for sftp and scp. You should apply the HPN-SSH patches and the use the above ssh options if suitable. If you define an SSH_OPTS variable which has the value of your ssh options you can issue your sftp command as

sftp $SSH_OPTS joeuser@data.psc.xsede.org

and your scp command as

scp $SSH_OPTS local_file joeuser@data.psc.xsede.org:remote_file

Globus Online can also be used to keep a local directory synchronized with a directory on the Data Supercell.

Globus-url-copy

XSEDE users may use GridFTP clients to transfer files to and from the Data Supercell.

To use the command-line client globus-url-copy on an XSEDE-system login host (e.g. blacklight), first ensure that you have a current user proxy certificate for authentication with enough time on it to complete your transfer, e.g.:

joeuser@tg-login1:~> grid-proxy-info
subject  : /C=US/O=National Center for Supercomputing Applications/CN=Joe User
issuer   : /C=US/O=National Center for Supercomputing Applications/OU=Certificate Authorities/CN=MyProxy
identity : /C=US/O=National Center for Supercomputing Applications/CN=Joe User
type     : end entity credential
strength : 2048 bits
path     : /tmp/x509up_u99999
timeleft : 11:58:33

If the timeleft is not sufficient, or you get an "ERROR: Couldn't find a valid proxy" message, then use myproxy-logon (or if you have your own long term user certificate, grid-proxy-init) to obtain a new user proxy certificate, e.g.:

joeuser@tg-login1:~> myproxy-logon -l joexsedeuser -t 24
Enter MyProxy pass phrase:
A credential has been received for user joexsedeuser in /tmp/x509up_u99999.

where joexsedeuser is your XSEDE User Portal login name, -t 24 requests a 24-hour certificate, and the MyProxy pass phrase entered is your XSEDE User Portal password.

You can then use globus-url-copy to transfer files to/from the Data Supercell using the GridFTP server address gsiftp://gridftp.psc.xsede.org. This transfer will go through the PSC high-speed data conduit data.psc.xsede.org.

The gsiftp:// URLs are absolute paths to files. This means that when referring to a file or directory in your Data Supercell home directory, you must either use gsiftp://gridftp.psc.xsede.org/arc/users/joeuser/ or gsiftp://gridftp.psc.xsede.org/~/ to refer to your home directory (where joeuser is your PSC userid).

Examples:

  • List the files in my home directory on the Data Supercell:
    joeuser@tg-login1:~> globus-url-copy -list gsiftp://gridftp.psc.xsede.org/~/
    gsiftp://gridftp.psc.xsede.org/~/
    file1.dat
    file2.dat
    file3.dat
    newdata/
    olddata/
  • Transfer a file (testfile) from my scratch space on TACC Lonestar to my newdatadirectory on the Data Supercell:
    joeuser@tg-login1:~> globus-url-copy -stripe -tcp-bs 32M \ 
    gsiftp://gridftp.lonestar.tacc.xsede.org/scratch/99999/tg987654/testfile gsiftp://gridftp.psc.xsede.org/~/newdata/

    where -stripe and -tcp-bs 32M are used to improve transfer performance, and /scratch/99999/tg987654 is the scratch directory on Lonestar at TACC.

  • Transfer a file (testfile), while logged into blacklight, from brashear on blacklight to my newdata directory on the Data Supercell:
    joeuser@tg-login1: globus-url-copy -stripe -tcp-bs 32M \ 
    /brashear/joeuser/testfile gsiftp://gridftp.psc.xsede.org/~/newdata

Gsisftp and gsiscp

If you have a current user proxy certificate you can also use gsisftp or gsiscp to transfer files to and from the Data Supercell. The method of obtaining a valid user proxy certificate is described above in the discussion of the globus-url-copy command. The default directory for both gsisftp and gsiscp is your Data Supercell home directory.

A sample gsisftp transfer session is

 

gsisftp data.psc.xsede.org
sftp>pwd
Remote working directory: /arc/users/joeuser
sftp>get file1.dat localfile1.dat
Fetching /arc/users/joeuser/file1.dat to /usr/local/projects/localfile1.dat
/arc/users/joeuser/file1.dat   100%   31    0.0KB/s    00:00
sftp>bye

 

A sample gsiscp command is

 

gsiscp data.psc.xsede.org:file1.dat localfile1.dat
file1.dat    100%   1016KB    101.0 MB/s      00:00

 

Far

To transfer files between the Data Supercell and a PSC production system, such a blacklight, you can also use PSC's far program. Far is available on all PSC production platforms. In addition to file transfers, the far program can be used for file and directory management, such as getting a list of your files on the Data Supercell. See the far documentation for more information. Globus Online is our recommended method for transferring files to and from the Data Supercell. The far utility is being maintained for backward compatability with older scripts. At some time in the future it will not be supported.

We recommend that you execute far commands outside of your batch compute job scripts so that your jobs do not tie-up compute processors and you do not expend your computing allocation transferring files.

Tar

The tar command, in conjunction with the ssh command, can also be used to transfer files to the Data Supercell. This approach is especially useful if you are transferring a lot of small files.

There are two methods of using tar. In both methods you use tar as part of your transfer command. The first method enables you to have a tarball on the Data Supercell without having to create one on your source system. Having a tarball on the Data Supercell rather than a lot of individual files results in faster retrieval from the Data Supercell. However, if you create a tarball on your source machine before you transfer it to the Data Supercell you double your storage usage on your source machine. The first method of using tar in your transfer command avoids this doubling of space usage. A sample command is

    tar cf - sourcedir/ | ssh data.psc.xsede.org "cat > dscdir.tar"

For 'sourcedir' you subsitute the name of your source directory. The result of this command will be the tarball dscdir.tar on the Data Supercell without the need to first create it on your source system.

In the second method the result is not a tarball, but the same set of individual files you have on your source system. A sample command is

tar -cf sourcedir/ | ssh data.psc.xsede.org tar xf -

For 'sourcedir' you substitute the name of your source directory. The result will be the same as the equivalent scp command, but the transfer will be faster. You should use the first method of using tar if you have a lot of individual files.

If you are not concerned about your local disk storage or the performance of scp or want to use a simpler method, you can use tar on your local system and then use scp to transfer your file to the Data Supercell and from there to your blacklight scratch area.

A sample set of commands on your source data would be

    tar cf sourcedir.tar sourcedir

    scp sourcedir.tar joeuser@data.psc.xsede.org

For 'joeuser' you substitute your PSC userid. You can compress your tarball before your transfer to speed up your transfer times. Then you would login to blacklight and issue the commands

    cd $SCRATCH

    tar xf /arc/users/joeuser/sourcedir.tar

Again for 'joeuser' you substitute your userid. This will unroll your tar file in your scratch directory. The scp command was discussed above.

Checksums

After you transfer your files to the Data Supercell you might want to compute checksums for your transferred files to see if they have been corrupted during the tranfer.

If you are logged on to a PSC machine that mounts the Data Supercell, such as blacklight, you can issue the command

cksum /arc/users/userid/file1

to generate a checksum for file1. For 'userid' you substitute your PSC userid. If you are logged in to a remote machine you can issue the command

ssh -l userid  data.psc.edu "cksum /arc/users/userid/file1"

to generate a checksum for file file1. For both occurrences of 'userid' in this command you substitute your PSC userid.

Improving Your File Transfer Performance

File transfer performance between your local systems and data.psc.edu can be significantly improved by ensuring that your local systems' networking parameters are optimized. Guidance is available at PSC's Enabling High Performance Data Transfers webpage.

For improved performance when using SSH (sftp or scp), we recommend using an SSH package that includes PSC's High Performance Networking (HPN) patches, e.g., GSI-OpenSSH. For instructions to build OpenSSH with PSC's HPN patches, consult the PSC High Performance SSH/SCP - HPN-SSH webpage.

Sharing Files

The Data Supercell is a Unix file system. Thus, to share files on the Data Supercell you must use Unix file protections. If you are are in the same Unix group on the Data Supercell, then you can use Unix group protections to share files. Otherwise, you must use Unix world protections. This would give access to your files to any user that can get access to the Data Supercell. To set your Data Supercell file protections cd to your Data Supercell home directory and use the chmod command. Sftp as well as some terminal emulators, such as WinSCP, will allow you to set file protections on remote systems.

Publications

PSC requests that a copy of any publication (preprint or reprint) resulting from research done on blacklight be sent to the PSC Allocations Coordinator. In addition, if your research was funded by the NSF you should log your publications at the XSEDE Portal. We also request that you include an acknowledgement of PSC in your publication.

Help

To get assistance on using the Data Supercell or to report a problem using the Data Supercell you have three options.

  • You can send email to help@xsede.org, mentioning PSC in the subject line.
  • You can send email to remarks@psc.edu.
  • You can contact the PSC User Services Hotline at 412-268-6350 from 9:00 a.m. until 5:00 p.m., Eastern time, Monday through Friday