Exploring Python Bioinformatics Packages with Jupyter Notebook#

Shirley Li, Bioinformatician, TTS Research Technology xue.li37@tufts.edu

Date: 2024-11-01

In this tutorial, we will use the Anndata package as an example to show how to run interactive Python sessions through the Tufts Open OnDemand Jupyter Notebook.

#

Prerequisite#

  1. Familiarity with Linux commands

  2. Experience working with conda environments

Creating conda environment#

  1. Start an interactive job session
    srun -p interactive -n 1 --time=04:00:00 --mem 4g --pty bash

  2. Load anacoda or minoconda module

    module load anaconda/2021.05

    or

    module load anaconda/2021.11

    or

    module load miniconda/23.10

  3. Load conda-env-mod module

    module load conda-env-mod/default

  4. Configure your conda

    NOTE (steps in this session only needs to be executed ONCE)

    Since your home directory has limited storage, it’s recommended to install conda packages in your group research storage space. Follow these steps:

    Create two directories in your group research storage space (one for storing the envs, one for storing the pkgs, for example: condaenv, condapkg)

    $ mkdir /cluster/tufts/XXXXlab/$USER/condaenv/

    $ mkdir /cluster/tufts/XXXXlab/$USER/condapkg/

    If you haven’t used conda before on the cluster, create a file named “.condarc” in your home directory.

    Now add the following 4 lines to the .condarc file in your home directory (modify according to your real path to the directories):

    envs_dirs:
      - /cluster/tufts/XXXXlab/$USER/condaenv/
    pkgs_dirs:
      - /cluster/tufts/XXXXlab/$USER/condapkg/
    

    After this, your .condarc file should look like this

    $ cat ~/.condarc

    envs_dirs:
      - /cluster/tufts/XXXXlab/$USER/condaenv/
    pkgs_dirs:
      - /cluster/tufts/XXXXlab/$USER/condapkg/
    channels:
      - bioconda
      - conda-forge
      - defaults
    
  5. Create your conda enviroment with conda-env-mod

    Change yourenvname to the name of the environment you intend to create

    cd /cluster/tufts/XXXXlab/$USER/condaenv/
    conda-env-mod create -p yourenvname python=3.8  --jupyter
    

    You will see something like this, and enter y to continue

    The following NEW packages will be INSTALLED:
    
      _libgcc_mutex      conda-forge/linux-64::_libgcc_mutex-0.1-conda_forge 
      _openmp_mutex      conda-forge/linux-64::_openmp_mutex-4.5-2_gnu 
      asttokens          conda-forge/noarch::asttokens-2.4.1-pyhd8ed1ab_0 
      bzip2              conda-forge/linux-64::bzip2-1.0.8-hd590300_5 
      ca-certificates    conda-forge/linux-64::ca-certificates-2024.7.4-hbcca054_0 
      ...
    
    Proceed ([y]/n)? y
    

    When it’s complete, you will see something like this.

    
    
    Preparing transaction: ...working... done
    Verifying transaction: ...working... done
    Executing transaction: ...working... done
    +---------------------------------------------------------------+
    | To use this environment, load the following modules:          |
    |     module load use.own                                       |
    |     module load conda-env/bio_test-py3.11.5                   |
    | (then standard 'conda install' / 'pip install' / run scripts) |
    +---------------------------------------------------------------+
    
  6. Activate conda environment and install new packages

    Note: conda-env/bio_test-py3.11.5 this may be different and it depends on what yourenvname you have

    module load use.own 
    module load conda-env/bio_test-py3.11.5 
    
    conda list # check packages installed in this environment
    
    pip install jupyter
    pip install anndata
    
    conda list # check again
    

# packages in environment at bio_test:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
anndata                   0.10.8                   pypi_0    pypi
array-api-compat          1.7.1                    pypi_0    pypi
asttokens                 2.4.1              pyhd8ed1ab_0    conda-forge
bzip2                     1.0.8                hd590300_5    conda-forge
...
  1. Create a jupyter kernal

    conda-env-mod kernel -n bio_test

You will see something like this:

Setting CONDA_ENVS_PATH=/cluster/home/xli37/.conda/envs/rhel7.8/conda-23.10.0
New environments will be created in this directory unless --prefix is specified.
requested kernel with arguments:  -n 'bio_test' --

Jupyter kernel created: "Python (My bio_test Kernel)"
+---------------------------------------------------------------+
| We recommend installing packages into your kernel environment |
| via the command line (with 'conda install' or 'pip install'). |

Using Open Ondemand Jupyter Lab#

Natigate to Open Ondemand

In Open Ondemand dashboard, let’s go to Interactive APPs => Jupyter Lab and select the number of hours, number of cores, and Amount of memory that you would like to request and Launch this job.

Under Notebook, select the kernel you just created. Ex: anndata_python.

Start your python code from there.

Example code to check the Anndata installation:

import anndata as ad
from scipy.sparse import csr_matrix
print(ad.__version__)

Tutorials for ANNDATA#

https://anndata.readthedocs.io/en/latest/tutorials/notebooks/getting-started.html

Some basic python commands#

Check current path

import os
print(os.getcwd())

Go to a new path

os.chdir('/cluster/tufts/XXLAB/$USER/')

Check what files exist in current path

os.listdir(os.getcwd())