🔄 Copy and Transfer Files
The HBS Grid is primarily used for data analysis, machine learning, data wrangling, and data visualization. Usually this means that you need to copy or sync your data to the HBS Grid in order to do your work.
HBS Grid storage overview
Before transferring data to the HBS Grid you have to decide where to put it. There are three options: home directory, project space, or scratch storage.
HBS Grid storage overview
A home directory was created at /export/home/<group>/<username>
when you requested your account. Your home folder has limited storage
capacity and is accessible only by you.
Project spaces are directories that are shared and accessible by all HBS Grid users working on that project. You can request a new project space using the new project space request form and you can request modifications to an existing project space using the change request form.
Scratch storage is available at /export/scratch
. It is appropriate only for temporary,
short-term storage. Files are not backed up and will be deleted after 60 days.
Scratch storage is a shared resource accessible to all users on the HBS Grid;
make sure you set permissions on your files accordingly.
Refer to the Research Data Storage and Databases documentation for details.
Local storage data transfer
SFTP
Transferring data from your local computer to the HBS Grid is usually done using the SFTP protocol. This requires an SFTP client on your local machine. If you don't yet have one Cyberduck and Filezilla are popular graphical desktop clients. For command-line data from a terminal we recommend rsync.
Once you have an SFTP client installed on your local machine, follow these steps to transfer data to or from the HBS Grid.
Transfer data the HBS Grid using a desktop SFTP client
-
Connect to the HBS network, either directly if you are on-campus or connect via VPN otherwise.
-
Open your transfer client and connect to the HBS Grid at
hbsgrid.hbs.edu
on port 22 -
Locate the data on your local machine that you wish to transfer.
-
Locate the directory on the HBS Grid that you will copy your data too, creating it if needed.
-
Start the data transfer
Transfer from the command line
You can alternatively use the rsync
command-line program to transfer data from the
command line on Mac, Linux, or Windows subsystem for Linux.
rsync
documentation is available online.
Note that transferring many small files is much slower than transferring
a small number of large files. You may find it faster to compress
folders with many small files into .zip
or .tar
archives, transfer those,
and decompress/extract them on the other end.
(See additional data transfer tips below.)
Click the image below for a demonstration showing how to sync your data from a local drive to the HBS Grid:
Mount Storage Locally
Research storage is also accessible on Windows as a network drive
at \\research.hbs.edu
,
via SMB on OSX/Linux
at smb://research.hbs.edu
, and via SSH at hbsgrid.hbs.edu
.
This is useful for viewing and copying small files, but will be slow for large data transfers
and may result in unexpected permissions settings on the cluster.
-
Connect to the HBS network, either directly if you are on-campus or via VPN otherwise.
-
Open a Windows Explorer window, right-click on the "Computer" icon, and then select "Map Network Drive".
-
To map a drive to your home directory, specify the folder path
\\research\username
(for example,\\research\jharvard
). To map a drive to a project space, specify the path\\research.hbs.edu\projects\projectname
(note that you may have to use projects, projects2, projects3, projects4, or projects5 depending on the path of your project space). Also note that you may not map a drive to a project space containing security level 4 data. -
Click "Connect using different credentials" if you are not using an HBS-issued machine. If you are prompted for your username and password, enter your HBSGrid username (the part preceding @hbs.edu) and your password. If you are connecting from a non-HBS-issued machine, please add HBS\ before your username (e.g. HBS\jharvard). This specifies the proper Windows domain for authenticating your credentials.
-
Connect to the HBS network, either directly if you are on-campus or via VPN otherwise.
-
From the Finder menu bar, select Go > Connect to Server...
-
In the Server Address field, enter the domain\username, server address, and file path combination that is appropriate for your the space you're trying to access. For your home directory, this will be
smb://HBS\jharvard@research.hbs.edu/jharvard
, and for project spaces, this will besmb://HBS\jharvard@research.hbs.edu/projects/projectname
(note that you may have to use projects, projects2, projects3, projects4, or projects5 depending on the path of your project space). In both cases, use your own HBS username in place of "jharvard." Also note that you may not mount a project space containing security level 4 data.
Cloud storage data transfer
Sync from the command-line
rclone
is also available as a command-line application that you can
use interactively in a terminal or in scripts. Refer to the
rclone documentation for details.
If your data is in cloud storage (OneDrive, Dropbox etc.) you may wish to sync it directly from
there. While the HBS Grid does not offer native Dropbox, OneDrive,
or other cloud storage clients, you can use rclone
to perform on-demand
data synchronization with all major cloud storage providers. Transferring data from
cloud storage providers to the HBS Grid using this tool is generally reasonably fast and easy.
Sync your data from a cloud provider to the HBS Grid desktop
-
Log in to the HBS Grid via NoMachine.
-
Identify the directory on the HBS Grid that you will copy your data to, creating it if needed.
-
From the HBS Grid desktop, open the rclone browser application.
-
Click the Config... button and follow the prompts (only needed the first time).
-
Click the cloud storage icon in the Remotes tab and select the directory you wish to sync.
-
Specify the target directory from step 2 in the destination field.
Click the image below for a quick demonstration showing how to copy files from Dropbox to the HBS Grid.
Note that the demonstration video goes through the configuration step, which only needs to be done once. After that you can skip step 4 above, which greatly simplifies the process.
Globus data transfer
Globus is a data transfer service that enables sharing files or data with external persons, eliminates the need for both parties to have HBS or guest user credentials, and is capable of tolerating transfer interruptions. Note: At this time, Globus should not be used to transfer DSL 4 research data.
Globus key concepts
A Globus Collection is a named location containing data you can access with Globus.
HBS maintains a Globus Collection named Harvard Business School DTN.
Storage for this collection is mounted on the HBS Grid at /export/globus
.
You can create folders there and share them with other Globus users, or
transfer data to them from other Globus collections you have access to.
As with scratch storage, /export/globus
is
accessible by all grid users, so you must
ensure that both the ownership and permissions are set appropriately.
Please be aware that you can share out files or folders not only that you own but also that you have explicit permission to access. Please be careful not to share too broadly.
Log in to Globus and transfer data
-
Login to the Globus web interface, selecting "Harvard University" as your organization and authenticating via HarvardKey if prompted.
-
In the Collection field, search for and select the Harvard Business School DTN (data transfer node).
If you have not already created a folder here, create one now by clicking the New Folder button. -
Click on the "Transfer or Sync to..." button.
-
Use the second Collection search box on the right to find the other collection.
-
Select the folder you wish to transfer and click on the appropriate Start arrow icon. (The direction the Start button arrow points indicates the direction files will transfer.)
Globus can also be used to transfer data from you local machine using Globus Connect Personal. Installers are available for Mac OS X. Windows and Linux.
After transferring your data to the HBS Grid via Globus you will typically move it from
/export/globus
to a project folder. This can be done one the HBS Grid NoMachine
desktop using Grsync or from the HBS Grid command line using rsync
, or mv
or similar.
For details on using Globus refer to the excellent Globus documentation. A FAQ section is also available.
Data transfer tips
The speed and success of data transfer rely on numerous factors, only some of which are in your control. Following these listed tips will help increase the likelihood of success:
-
Wired, ethernet connections tend to be more reliable and faster than wireless internet connections.If using WiFi, try to use a network that has a strong signal and is interference-free.
-
Transferring a few large files is often much faster than transferring many small files. When transferring many small files we recommend that you compress/archive the directories up into a small number of archives. On the HBSGrid cluster, we recommend the GUI File Manager in NoMachine/Gnome or using command-line programs like
tar
orzip
. -
We do not recommend creating any single archive file larger than about 100 GB in size: Many transfer programs don't support resuming partial transfers, so If your transfer is interrupted it will start again and transfer the whole file.
-
You can use LSF batch jobs to quickly compress large files and/or a large number of files. If using Globus, you can create archives directly in your Globus collection folder, eliminating the need to copy the data twice.