Troubleshooting Login Issues
Login issues are usually caused by one of three common problems, and are often easy to resolve.
Network and VPN
Connecting to the HBS Grid requires either a direct on-campus connection to the HBS network or a VPN connection if you are connecting remotely.
Ethernet connections from HBS offices or WiFi connections to HBS Secure will both work without further configuration. Note that connecting from other Harvard networks, such as Harvard Secure or an Ethernet connection from another Harvard School will not work; you must be connected to the HBS network.
If you are connecting from outside the HBS network you must use a VPN connection. If you suspect the VPN is not connected properly try re-installing the VPN client and restarting your computer.
A quota system is used to limit the amount of data you can store in your home directory on the HBS Grid. Reaching this limit can prevent NoMachine sessions from starting, and this is one of the most common reasons for difficulties connecting to the HBS Grid desktop via NoMachine.
You can fix this problem yourself by opening a terminal (in the Windows search toolbar, type "Cmd" or "Windows PowerShell"; in the Mac search toolbar, type "Terminal") and running
<username> with your actual HBS Grid username). Once connected you can use terminal
ls to list files in the directory,
rm (remove) plus the name of the file to remove files you don't need, or
mv (move) plus the name of your file and a path to a new location to move files. Removing and moving files can help get your home directory back under your storage
quota. You can also run
gio trash --empty to empty the trash, which may give you enough
breathing room to permit NoMachine login.
Some users like to configure the startup behavior of their login shell by editing the
~/.bash_profile configuration files. A common problem is that activating
conda, software modules or other environments in these config files can cause problems
with NoMachine connections.
If you suspect this has happend, you can fix this problem yourself by opening a terminal (Cmd prompt or PowerShell on Windows) and running
<username> with your actual HBS Grid username). One connected you can use a
such as nano
to comment out or remove sections of your config files that you suspect have caused the problem.
Alternative you can run
to temporarily move your config files to backup locations.
Troubleshooting LSF Jobs
A variety of problems can arise when running jobs and applications on the HBSGrid. LSF provides command-line tools to monitor and inspect your jobs to help you figure out if something goes wrong.
Job troubleshooting steps
Open a Terminal and the HBS Grid and run the commands below to troubleshoot jobs.
- Get the JOBID number by running
If your job is no longer running use
to list all your recent jobs. The JOBID is the first number in the output`.
- Get detailed information about a specific job by running_
bjobs -l <JOBID>
<JOBID>is the number you looked up in step 1.
- You can also look at any output produced by your job by running
- Older jobs may not appear in
bjobs. In that case you can still get some information by running
bhist -l <JOBID>
bjobs -l <JOBID> command give you information about the state of the job,
as defined below.
Job state definitions
Job is awaiting a slot suitable for the requested resources or you've gone over your limit on resource usage. Jobs with high resource demands may spend significant time PENDING if the compute grid is busy.
Job is running.
Job has finished and the command(s) have returned successfully (i.e., exit code 0).
Job has been terminated by the user or administrator using
Job finished with an exit code other than 0.
If your job has failed
bjobs will usually tell you why, but these messages can be cryptic.
The most common are described below.
||You did not specify enough time in your submission script. The
||Your job is attempting to use more memory than you've requested for it. Either increase the amount of memory requested or, if possible, reduce the amount your application is trying to use. For example, many Java programs set heap space using the
||Your job failed because your application exited with an error. Please look at the job or application logs to determine why your program exited abnormally.|
For more detailed information refer to the official LSF documentation.
Stata Temporary Files and Temp Storage
If there is not enough disk space available in
/tmp Stata may give you an error message that looks like this:
As a first step you may be able to change your Stata code to reduce the amount of temp space
restore commands are often the cause.
You can also try deleting any files you have in
/tmp and see if that gives you enough space.
Since each computer in the cluster has it's own
/tmp disk you need to do this on the computer
Stata is running on. An easy way to achieve that is to delete files directly from Stata
using the shell escape feature. For example, running
! rm /tmp/my-temp-file in Stata will delete
If you cannot get enough space on
/tmp you can tell Stata to store temporary files in a
Scratch storage directory on the HBS Grid.
Use scratch storage for Stata temp files
- Create a directory under
/export/scratchand ensure that the permissions are set correctly.
- Set the
STATATMPenvironment variable to the directory you created in step one. Use launcher options if running from the destkop, or set this variable from the command line.
- Start Stata as usual after setting the
STATATMPenvironment variable as described in steps 1-2 above.
More details about this issue can be found in the Stata FAQ.