Installing Python and R packages on the Grid
This post was last updated on 2021-10-26.
If you think Python 2.6.6 or R 3.0.2 sound old, I have good news for you. The Conda package manager makes it easy to install the latest and greatest Python and R packages in your home directory. Conda facilitates the installation of Python 2.7.11, Python 3.5.2, and R 3.3.1.
Installation
The first step to installing Python or R on the Grid is to install Miniconda. The steps below outline how to install Miniconda.
Log in to the grid:
ssh researchgrid.hbs.edu
Set up an alias so it’s easy to submit interactive jobs to back-end nodes:
alias my_run="bsub -app generic-5g -q interactive -Is"
Download the Miniconda installer:
my_run wget http://grid.rcs.hbs.org/files/miniconda2-linux-x86_64-4.1.12.sh.gz
Unzip and run the installer
gunzip miniconda2-linux-x86_64-4.1.12.sh.gz chmod +x miniconda2-linux-x86_64-4.1.12.sh my_run ./miniconda2-linux-x86_64-4.1.12.sh
Here’s how I answered the questions when running the installer:
- Do you approve the license terms? yes
- I pressed enter to install Miniconda2 in ~/miniconda2
- Do you wish the installer to prepend the Miniconda2 install location to PATH in your ~/.bashrc ? no
Make sure the Miniconda bin is on your search path:
export PATH="~/miniconda2/bin:$PATH"
If you skip this step, trying to use Conda will throw error messages like
-bash: conda: command not found
. It is also useful to put the same export command in your~/.bash_profile
file. This way when you log into the grid and call python in the future it will find the python in~/miniconda2/bin
first rather than using the old version in/usr/local/bin
.Remove the Miniconda installer:
rm -f miniconda2-linux-x86_64-4.1.12.sh
Python
Note this step requires that you have already installed Miniconda. If you have not installed Miniconda yet, return to Section .
The following command installs the typical Python packages used in social science research:
my_run conda install anaconda
If the Python packages included in Anaconda are insufficient for your needs, Conda’s documentation on managing packages has excellent information on how to install additional packages. The general approach to installing additional packages proceeds as follows:
See if the package is available through Conda with
conda search
. If it is, install the package usingconda install
.See if the package is available on http://anaconda.org. If it is, install the package using
conda install
being sure to specify the correct channel.To install a non-conda package, use
pip
to install the package.
R
Note this step requires that you have already installed Miniconda. If you have not installed Miniconda yet, return to Section .
The following command installs the typical R packages used in social science research:
my_run conda install -c r r-essentials
Notice that the above command installs r-essentials
from the r
channel. This package contains the latest version of R.
If you need additional R packages the general approach proceeds as follows:
Search http://anaconda.org. If the package is available use
conda install
specifying the appropriate channel.If the package is available through CRAN but not Conda, you can create a new Conda package from the CRAN repository. For documentation on this process see building conda packages and conda skeleton cran. This looks like a bit of work to do properly.
This Stack Overflow answer provides a quick and dirty workaround if you don’t want to build new Conda packages. The key insight is to open R and use
install.packages
being sure the specify the correct path for where to install the package, something like:install.packages("rstan", lib = "~/miniconda2/lib/R/library")
Environments
One of Conda’s most useful features is the ability to create virtual environments. This is particularly helpful if you have multiple projects that depend on different versions of packages. With a virtual environment you can update the packages for one project without disturbing the packages of your other projects. Conda’s documentation on managing environments is a good place to learn about this feature.
Execution
Now that you have installed python and R in ~/miniconda2/bin/
you
need to run these programs using bsub commands so your computationally
intense jobs are run on back-end nodes rather than on front-end
nodes. Below I give a quick introduction to submitting batch and
interactive jobs through LSF.
Batch
To start let’s create an alias describing a bsub
command for submitting batch jobs. If you want to learn more about bsub go to this page in the documentation.
alias my_batch="bsub -app generic-5g -q normal"
If you want to run a Python script named your_file.py
you would run:
my_batch ~/miniconda2/bin/python your_file.py
Notice that it’s important to give the full path to your installation of Python. Similarly, here is how to run an R script named your_file.R
:
my_batch ~/miniconda2/bin/Rscript your_file.R
Interactive
Note, I define the my_run
alias used below in Section .
There are a lot of ways to run interactive Python and R jobs on the Grid. I’m going to highlight the most enjoyable ways:
Jupyter Console
If you want to work at the command line, the Jupyter Console makes interactive work quite pleasant and it works with both Python and R. To run Python use:
my_run jupyter console
To run R use:
my_run jupyter console --kernel=ir
Jupyter Notebook
The Jupyter Notebook is not currently supported on the Grid for security reasons.
RStudio Desktop
It is possible to run RStudio Desktop on the grid. Here is how I run RStudio (note the spelling of the Rstudio
command has a capital R
and a lower-case s
):
export RSTUDIO_WHICH_R=~/miniconda2/bin/R
Rstudio
Unfortunately, install RStudio on the Grid is quite challenging and the currently installed version is quite old. You’re likely to have a more pleasant interactive experience using the Jupyter Notebook, which is easy to install.
Troubleshooting
lattice
was built before R 3.0.0
R
is working great, but I’m getting the following error:
> library(lattice)
Error: package ‘lattice’ was built before R 3.0.0: please re-install it
This is happening because R
is looking for packages in two places:
> .libPaths()
[1] "/usr/local/apps/R/packages"
[2] "/export/home/dor/amarder/miniconda2/lib/R/library"
The first directory has old packages in it. When R
is loading lattice
it tries to import the version in /usr/local/apps/R/packages
first, unfortunately this version is too old and the import fails. To fix this issue remove this directory from .libPaths
.
> .libPaths(.libPaths()[.libPaths() != "/usr/local/apps/R/packages"])
> library(lattice)
You have successfully imported lattice in the current R session. To make this change affect all future R
sessions add the following line to your ~/.Rprofile
file:
.libPaths(.libPaths()[.libPaths() != "/usr/local/apps/R/packages"])