R#

In this guide, we walk you through how to start and run R from shell command line on Tufts HPC cluster

R Interactive Session#

An R interactive session refers to the mode of using the R programming language interactively, where users can directly enter commands, execute them, and immediately see the results. It allows users to explore data, test functions, and perform analyses in real-time within the R environment. This mode is particularly useful for tasks like data exploration, debugging code, and quick data manipulations.

Steps:

  1. Login to the HPC cluster (new cluster Pax)

    ssh your_username@login-prod.pax.tufts.edu

  2. From the login node, load R module (e.g. r/4.4.3)

    $ module load r/4.4.3

    Please check other available R versions with module av r

  3. Allocate computing resources. Start an interactive session with your desired number of cores and memory, here we are using 2 cores with 4GB of memory

    $ srun -p batch -n 2 --mem=4g -t 4:00:00 --pty bash

    For more information on how to allocate resources on Tufts HPC cluster Slurm

  4. Within the interactive session, you can start R

    $ R

In R, you can install the packages you need in your home directory with:

> install.packages("XXX").

If you are having trouble installing packages, please contact tts-research@tufts.edu.

  1. Exit from R command line interface:

    > q()

  2. It is important to $ exit the interactive session when finished to free up resources for other users.

R batch jobs#

R batch job refers to running R scripts or R commands in a batch mode, where the job is submitted to the computing cluster’s scheduler to be executed asynchronously. Batch jobs are typically used for computationally intensive tasks or tasks that require significant processing time, as they allow users to submit jobs and continue working without having to wait for the job to complete. This approach is useful for running R scripts that involve large datasets, complex calculations, or simulations that may take a long time to finish.

Steps:

  1. Login to the HPC cluster

  2. Upload your R script or create your own on HPC cluster

  3. Go to the directory/folder which contains your R script, e.g.:

$ cd /cluster/tufts/xxxlab/username/myfolder

  1. Open your favorite text editor and write a slurm submission script similar to the following one batchjob.sh (name your own)

    #!/bin/bash
    #SBATCH -J myRjob             #job name
    #SBATCH --time=00-00:20:00    #requested time
    #SBATCH -p batch              #running on "batch" partition/queue
    #SBATCH -n 1                  #1 CPU core total
    #SBATCH --mem=2g              #requesting 2GB of RAM total
    #SBATCH --output=myRjob.%j.out #saving standard output to file
    #SBATCH --error=myRjob.%j.err  #saving standard error to file
    #SBATCH --mail-type=ALL       #email optitions
    #SBATCH --mail-user=Your_Tufts_Email@tufts.edu
    
    #Below this point, are the commands executed on the computing resource allocated
    module purge
    module load r/4.4.3
    Rscript --no-save your_rscript_name.R
    
  2. Submit it with

    $ sbatch batchjob.sh

NOTE: If you are submitting multiple batch jobs to run the same R script on different datasets, please make sure the results are saved to unique files inside of your R script.

Learn more about slurm batch job