Stata Temporary Files and Temp Storage
If there is not enough disk space available in
/tmp Stata may give you an error message that looks like this:
As a first step you may be able to change your Stata code to reduce the amount of temp space
restore commands are often the cause.
You can also try deleting any files you have in
/tmp and see if that gives you enough space.
Since each computer in the cluster has it's own
/tmp disk you need to do this on the computer
Stata is running on. An easy way to achieve that is to delete files directly from Stata
using the shell escape feature. For example, running
! rm /tmp/my-temp-file in Stata will delete
If you cannot get enough space on
/tmp you can tell Stata to store temporary files in a
Scratch storage directory on the HBS Grid.
Use scratch storage for Stata temp files
- Create a directory under
/export/scratchand ensure that the permissions are set correctly.
- Set the
STATATMPenvironment variable to the directory you created in step one. Use launcher options if running from the destkop, or set this variable from the command line.
- Start Stata as usual after setting the
STATATMPenvironment variable as described in steps 1-2 above.
More details about this issue can be found in the Stata FAQ.
Troubleshooting LSF Jobs
A variety of problems can arise when running jobs and applications on the HBSGrid. LSF provides command-line tools to monitor and inspect your jobs to help you figure out if something goes wrong.
Job troubleshooting steps
Open a Terminal and the HBS Grid and run the commands below to troubleshoot jobs.
- Get the JOBID number by running
If your job is no longer running use
to list all your recent jobs. The JOBID is the first number in the output`.
- Get detailed information about a specific job by running_
bjobs -l <JOBID>
is the number you looked up in step 1.
- You can also look at any output produced by your job by running
- Older jobs may not appear in
bjobs. In that case you can still get some information by running
bhist -l <JOBID>
bjobs -l <JOBID> command give you information about the state of the job,
as defined below.
Job state definitions
Job is awaiting a slot suitable for the requested resources or you've gone over your limit on resource usage. Jobs with high resource demands may spend significant time PENDING if the compute grid is busy.
Job is running.
Job has finished and the command(s) have returned successfully (i.e., exit code 0).
Job has been terminated by the user or administrator using
Job finished with an exit code other than 0.
If your job has failed
bjobs will usually tell you why, but these messages can be cryptic.
The most common are described below.
||You did not specify enough time in your submission script. The
||Your job is attempting to use more memory than you've requested for it. Either increase the amount of memory requested or, if possible, reduce the amount your application is trying to use. For example, many Java programs set heap space using the
||Your job failed because your application exited with an error. Please look at the job or application logs to determine why your program exited abnormally.|
For more detailed information refer to the official LSF documentation.