--- tags: bioinformatics --- # Exploring Python Bioinformatics Packages with Jupyter Notebook Shirley Li, Bioinformatician, TTS Research Technology xue.li37@tufts.edu Date: 2024-11-01 In this tutorial, we will use the **Anndata** package as an example to show how to run interactive Python sessions through the Tufts Open OnDemand Jupyter Notebook. ## Prerequisite 1. Familiarity with Linux commands 1. Experience working with conda environments ## Creating conda environment 1. Start an interactive job session\ `srun -p interactive -n 1 --time=04:00:00 --mem 4g --pty bash` 1. Load anacoda or minoconda module `module load anaconda/2021.05` or `module load anaconda/2021.11` or `module load miniconda/23.10` 1. Load conda-env-mod module `module load conda-env-mod/default` 1. Configure your conda **_NOTE (steps in this session only needs to be executed ONCE)_** Since your home directory has limited storage, it’s recommended to install conda packages in your group research storage space. Follow these steps: Create two directories in your group research storage space (one for storing the envs, one for storing the pkgs, for example: condaenv, condapkg) `$ mkdir /cluster/tufts/XXXXlab/$USER/condaenv/` `$ mkdir /cluster/tufts/XXXXlab/$USER/condapkg/` If you haven't used conda before on the cluster, create a file named ".condarc" in your home directory. Now add the following 4 lines to the `.condarc` file in your home directory (modify according to your real path to the directories): ``` envs_dirs: - /cluster/tufts/XXXXlab/$USER/condaenv/ pkgs_dirs: - /cluster/tufts/XXXXlab/$USER/condapkg/ ``` After this, your `.condarc` file should look like this `$ cat ~/.condarc` ``` envs_dirs: - /cluster/tufts/XXXXlab/$USER/condaenv/ pkgs_dirs: - /cluster/tufts/XXXXlab/$USER/condapkg/ channels: - bioconda - conda-forge - defaults ``` 1. Create your conda environment with conda-env-mod Change `yourenvname` to the name of the environment you intend to create ``` cd /cluster/tufts/XXXXlab/$USER/condaenv/ conda-env-mod create -p yourenvname python=3.8 --jupyter ``` You will see something like this, and enter `y` to continue ``` The following NEW packages will be INSTALLED: _libgcc_mutex conda-forge/linux-64::_libgcc_mutex-0.1-conda_forge _openmp_mutex conda-forge/linux-64::_openmp_mutex-4.5-2_gnu asttokens conda-forge/noarch::asttokens-2.4.1-pyhd8ed1ab_0 bzip2 conda-forge/linux-64::bzip2-1.0.8-hd590300_5 ca-certificates conda-forge/linux-64::ca-certificates-2024.7.4-hbcca054_0 ... Proceed ([y]/n)? y ``` When it's complete, you will see something like this. ``` Preparing transaction: ...working... done Verifying transaction: ...working... done Executing transaction: ...working... done +---------------------------------------------------------------+ | To use this environment, load the following modules: | | module load use.own | | module load conda-env/bio_test-py3.11.5 | | (then standard 'conda install' / 'pip install' / run scripts) | +---------------------------------------------------------------+ ``` 1. Activate conda environment and install new packages Note: `conda-env/bio_test-py3.11.5 ` this may be different and it depends on what `yourenvname` you have ``` module load use.own module load conda-env/bio_test-py3.11.5 conda list # check packages installed in this environment pip install jupyter pip install anndata conda list # check again ``` ​ ``` # packages in environment at bio_test: # # Name Version Build Channel _libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 2_gnu conda-forge anndata 0.10.8 pypi_0 pypi array-api-compat 1.7.1 pypi_0 pypi asttokens 2.4.1 pyhd8ed1ab_0 conda-forge bzip2 1.0.8 hd590300_5 conda-forge ... ``` 7. Create a jupyter kernel `conda-env-mod kernel -n bio_test` You will see something like this: ``` Setting CONDA_ENVS_PATH=/cluster/home/xli37/.conda/envs/rhel7.8/conda-23.10.0 New environments will be created in this directory unless --prefix is specified. requested kernel with arguments: -n 'bio_test' -- Jupyter kernel created: "Python (My bio_test Kernel)" +---------------------------------------------------------------+ | We recommend installing packages into your kernel environment | | via the command line (with 'conda install' or 'pip install'). | ``` ​ ## Using Open Ondemand Jupyter Lab Natigate to [Open Ondemand](https://ondemand.pax.tufts.edu/) In Open Ondemand dashboard, let's go to `Interactive APPs` => `Jupyter Lab` and select the `number of hours`, `number of cores`, and `Amount of memory` that you would like to request and Launch this job. Under `Notebook`, select the kernel you just created. Ex: anndata_python. Start your python code from there. Example code to check the Anndata installation: ``` import anndata as ad from scipy.sparse import csr_matrix print(ad.__version__) ``` ## Tutorials for ANNDATA https://anndata.readthedocs.io/en/latest/tutorials/notebooks/getting-started.html ## Some basic python commands Check current path ``` import os print(os.getcwd()) ``` Go to a new path ``` os.chdir('/cluster/tufts/XXLAB/$USER/') ``` Check what files exist in current path ``` os.listdir(os.getcwd()) ```