Exploring Python Bioinformatics Packages with Jupyter Notebook#

Shirley Li, Bioinformatician, TTS Research Technology xue.li37@tufts.edu

Date: 2024-11-01

In this tutorial, we will use the Anndata package as an example to show how to run interactive Python sessions through the Tufts Open OnDemand Jupyter Notebook.

#

Prerequisite#

Familiarity with Linux commands
Experience working with conda environments

Creating conda environment#

Start an interactive job session
srun -p interactive -n 1 --time=04:00:00 --mem 4g --pty bash
Load anacoda or minoconda module

module load anaconda/2021.05

or

module load anaconda/2021.11

or

module load miniconda/23.10
Load conda-env-mod module

module load conda-env-mod/default
Configure your conda

NOTE (steps in this session only needs to be executed ONCE)

Since your home directory has limited storage, it’s recommended to install conda packages in your group research storage space. Follow these steps:

Create two directories in your group research storage space (one for storing the envs, one for storing the pkgs, for example: condaenv, condapkg)

$ mkdir /cluster/tufts/XXXXlab/$USER/condaenv/

$ mkdir /cluster/tufts/XXXXlab/$USER/condapkg/

If you haven’t used conda before on the cluster, create a file named “.condarc” in your home directory.

Now add the following 4 lines to the .condarc file in your home directory (modify according to your real path to the directories):
```
envs_dirs:
  - /cluster/tufts/XXXXlab/$USER/condaenv/
pkgs_dirs:
  - /cluster/tufts/XXXXlab/$USER/condapkg/
```
After this, your .condarc file should look like this

$ cat ~/.condarc
```
envs_dirs:
  - /cluster/tufts/XXXXlab/$USER/condaenv/
pkgs_dirs:
  - /cluster/tufts/XXXXlab/$USER/condapkg/
channels:
  - bioconda
  - conda-forge
  - defaults
```

Create your conda enviroment with conda-env-mod

Change yourenvname to the name of the environment you intend to create

cd /cluster/tufts/XXXXlab/$USER/condaenv/
conda-env-mod create -p yourenvname python=3.8  --jupyter

You will see something like this, and enter y to continue

The following NEW packages will be INSTALLED:

  _libgcc_mutex      conda-forge/linux-64::_libgcc_mutex-0.1-conda_forge 
  _openmp_mutex      conda-forge/linux-64::_openmp_mutex-4.5-2_gnu 
  asttokens          conda-forge/noarch::asttokens-2.4.1-pyhd8ed1ab_0 
  bzip2              conda-forge/linux-64::bzip2-1.0.8-hd590300_5 
  ca-certificates    conda-forge/linux-64::ca-certificates-2024.7.4-hbcca054_0 
  ...

Proceed ([y]/n)? y

When it’s complete, you will see something like this.

Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done
+---------------------------------------------------------------+
| To use this environment, load the following modules:          |
|     module load use.own                                       |
|     module load conda-env/bio_test-py3.11.5                   |
| (then standard 'conda install' / 'pip install' / run scripts) |
+---------------------------------------------------------------+

Activate conda environment and install new packages

Note: conda-env/bio_test-py3.11.5 this may be different and it depends on what yourenvname you have

module load use.own 
module load conda-env/bio_test-py3.11.5 

conda list # check packages installed in this environment

pip install jupyter
pip install anndata

conda list # check again

# packages in environment at bio_test:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
anndata                   0.10.8                   pypi_0    pypi
array-api-compat          1.7.1                    pypi_0    pypi
asttokens                 2.4.1              pyhd8ed1ab_0    conda-forge
bzip2                     1.0.8                hd590300_5    conda-forge
...

Create a jupyter kernal

conda-env-mod kernel -n bio_test

You will see something like this:

Setting CONDA_ENVS_PATH=/cluster/home/xli37/.conda/envs/rhel7.8/conda-23.10.0
New environments will be created in this directory unless --prefix is specified.
requested kernel with arguments:  -n 'bio_test' --

Jupyter kernel created: "Python (My bio_test Kernel)"
+---------------------------------------------------------------+
| We recommend installing packages into your kernel environment |
| via the command line (with 'conda install' or 'pip install'). |

Using Open Ondemand Jupyter Lab#

Natigate to Open Ondemand

In Open Ondemand dashboard, let’s go to Interactive APPs => Jupyter Lab and select the number of hours, number of cores, and Amount of memory that you would like to request and Launch this job.

Under Notebook, select the kernel you just created. Ex: anndata_python.

Start your python code from there.

Example code to check the Anndata installation:

import anndata as ad
from scipy.sparse import csr_matrix
print(ad.__version__)

Tutorials for ANNDATA#

https://anndata.readthedocs.io/en/latest/tutorials/notebooks/getting-started.html

Some basic python commands#

Check current path

import os
print(os.getcwd())

Go to a new path

os.chdir('/cluster/tufts/XXLAB/$USER/')

Check what files exist in current path

os.listdir(os.getcwd())