Skip to content

Help and support

If you run into any problems please let us know! You can reach out to us directly via email at research@hbs.edu. Our support team is happy to assist you.

We've provided checklists below for the most common issues that our users experience. To help us help you, please work through the relevant list(s) and include all relevant information (LSF logs, error logs and messages, etc.) and at which step you encountered an issue when you reach out to us.

HBSGrid Account & Login Issues

  1. If you are a guest, have you successfully completed our instructions to obtain access to the cluster?
  2. Are you connected to the appropriate network: i.e., the HBS VPN if off-campus, or to the wired HBS ethernet/HBSSECURE wireless if on-campus?
    • Please note that using Harvard Secure VPN/Wireless or a non-HBS ethernet connection will not work.
  3. Are you using the correct username and login hostnames as outlined in our Quick Start guide?
  4. If you are accessing the cluster via NoMachine:
    • Is your home folder full?
    • Have you altered your login scripts (.bashrc / .bash_profile)? Activating conda, software modules or other environments in these config files can cause problems with NoMachine connections.

Storage & Project Spaces - Access Problems

  1. Has your access been approved by the sponsor of your space?
  2. If you received an email indicating that the project space sponsor has approved your access, have at least 2 hours passed since you were approved? This is how long the approval may take to sync in our systems.
  3. If you are trying to access the project space by mounting/mapping the drive, please check:
  4. If you are trying to access the project space via NoMachine, have you tried terminating your session and logging back in?
  5. If you are trying to access the project space via Terminal, have you tried accessing the project space from a new terminal?

Storage & Project Spaces - Permission Problems

  1. Have you hit the quota for the home or project folder?
    • You may have received an email notifying you about having reached storage quotas on your home folder or project space.
    • You can also check your disk usage for project spaces by running the command df -h filepath/to/directory.
  2. Are the permissions set appropriately for (shared) read/write access?
    • Inconsistent group ownership or read/write access can cause permission denied errors.
    • Verify the permissions by viewing the item's properties in the Files browser in NoMachine/Gnome or by running ls -al filepath/to/directory in a terminal.
    • Use our File Permissions instructions to change file/directory permissions. Enlist a colleague's help if needed, especially if the person owns the item in question.
  3. Did you transfer data via mounted volumes instead of via SFTP (e.g., Filezilla or Cyberduck)? Doing so often results in unexpected file / folder permissions. Please see our instructions in section 2 to change the permissions if this is what is causing the issue, and we recommend using other data transfer methods going forward.

Running Interactive & Batch Applications (Jobs) - PENDs or Not Running

  1. If using the GUI Launchers in NoMachine, did a warning dialog appear?
    • Is this explanation – "Job Requirements Not Satisfied" == no room yet on the cluster – indicative of the problem?
  2. Have you run the HBSGrid Job Monitor script or checked the status of your job using the terminal?
    • In NoMachine/Gnome, select Applications > System Tools > HBSGrid Job Monitor, or simply search for "Job Monitor".
    • Launch NoMachine's terminal via Applications > System Tools > Terminal, select 4GB RAM (and 1 CPU) to get a local terminal for using bjobs and its options.
  3. Do you already have 3 interactive sessions or 12 interactive cores running on the short_int queue or 4 interactive cores running on the long_int queue, or do you already have 150 cores running in total on the cluster? If you do, you have reached the limit of the resources one user can request. You can wait for these jobs to finish or you may opt to terminate one of your running jobs to prioritze another:
    • Launch NoMachine's terminal via Applications > System Tools > Terminal, select 4GB RAM (and 1 CPU) to get a local terminal. Review your running jobs using bjobs and its options. You can terminate a job using the bkill JOBID command or all jobs using bkill 0.
  4. Did you ask for > 50-100 GB RAM and/or > 4-8 cores? If so:
    • Are you over-asking? Could your request be reduced using information from previous runs or based on past usage or data file sizes/types?
    • Are you doing "big data" work? Could this be done more efficiently?
  5. If using a terminal, did you submit your job to the correct queue with the correct parameters? Or did you submit a job that could never be scheduled (e.g. a RAM size that won't fit anywhere)?

Running Interactive & Batch Applications (Jobs) - Crashes & Problems

  1. Do you have the jobID of your program?
    • If not, use bhist –a and/or bhist -l jobid to get LSF details
  2. Have you exceeded the time limit for your queue or run session?
  3. Does your program generate its own logs? If so, what do these indicate?
  4. Are you writing log entries to troubleshoot where you are having problems?
  5. If you are running a batch program, are you saving the cluster errors and output for your job? This can be accomplished using the bsub -o and -e options, e.g., bsub -q short -W 6:00 -R "rusage[mem=4000]" -M 4000 -o output_%J.out -e error_%J.err -B -N -u jharvard@hbs.edu

If you need additional assistance from RCS, please include the JOBID and any of the above logs in your email.