Running nf-core RNA-seq on Open OnDemand and CLI#

2026-02-20

Pipeline version used in workshop: nf-core/rnaseq 3.22.2

Workshop Overview#

This hands-on session demonstrates how to run the nf-core/rnaseq pipeline on the Tufts HPC system using two approaches:

Open OnDemand graphical interface
Command-line (CLI) submission via SLURM

Participants will learn how to prepare input files, configure pipeline parameters, submit jobs, and interpret key output results.

This workshop focuses on practical execution and reproducibility rather than RNA-seq theory.

1. Prepare Input Files#

1.1 FASTQ Files#

Place all FASTQ files in your working directory.

Example (paired-end):

SRX1693951_SRR3362661_1_sub.fastq.gz
SRX1693951_SRR3362661_2_sub.fastq.gz
...
SRX1693956_SRR3362666_2_sub.fastq.gz

Our sample fastq file located here:

/cluster/tufts/workshop/public/2026spring/nfcore/fastq/

1.2 Sample Sheet (`samplesheet.csv`)#

Required format:

sample,fastq_1,fastq_2,strandedness
GFPkd_1,SRX1693951_SRR3362661_1_sub.fastq.gz,SRX1693951_SRR3362661_2_sub.fastq.gz,auto
GFPkd_2,SRX1693952_SRR3362662_1_sub.fastq.gz,SRX1693952_SRR3362662_2_sub.fastq.gz,auto
GFPkd_3,SRX1693953_SRR3362663_1_sub.fastq.gz,SRX1693953_SRR3362663_2_sub.fastq.gz,auto
PRMT5kd_1,SRX1693954_SRR3362664_1_sub.fastq.gz,SRX1693954_SRR3362664_2_sub.fastq.gz,auto
PRMT5kd_2,SRX1693955_SRR3362665_1_sub.fastq.gz,SRX1693955_SRR3362665_2_sub.fastq.gz,auto
PRMT5kd_3,SRX1693956_SRR3362666_1_sub.fastq.gz,SRX1693956_SRR3362666_2_sub.fastq.gz,auto

Notes:

Paired-end files must match correctly.
Paths can be relative or absolute (recommended).

On HPC:

/cluster/tufts/workshop/public/2026spring/nfcore/samplesheet.csv

1.3 Reference Files#

You must provide:

Genome FASTA file
Gene annotation GTF file

Reference files can be either remote (URL) or local (file path on the cluster).

Remote URLs#

Example (Ensembl GRCh38 release 111):

FASTA (Genome):

https://ftp.ensembl.org/pub/release-111/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz

GTF (Annotation):

https://ftp.ensembl.org/pub/release-111/gtf/homo_sapiens/Homo_sapiens.GRCh38.111.gtf.gz

Local example (For workshop use only).#

This directory may be deleted after the workshop. Keep your own copy if you plan to rerun the analysis.

FASTA (Genome):

/cluster/tufts/workshop/public/2026spring/star_index/Homo_sapiens.GRCh38.dna.primary_assembly.fa

GTF (Annotation):

/cluster/tufts/workshop/public/2026spring/star_index/Homo_sapiens.GRCh38.111.gtf

2. Running via Open OnDemand#

Log in to:

Open OnDemand → nf-core pipelines → rnaseq (version 3.22.2)

Configure:

Input: samplesheet.csv
FASTA: (URL or local file path listed above)
GTF: (URL or local file path listed above)
Outdir: rnaseq
Working directory: your project directory
kraken_db: /cluster/tufts/biocontainers/datasets/kraken2/k2_standard_20251015
star_index: /cluster/tufts/workshop/public/2026spring/star_index/
salmon_index: /cluster/tufts/workshop/public/2026spring/salmon_index/
skip_pseudo_alignment: true

Other parameters can remain default unless discussed in workshop.

Submit job.

3. Running via CLI (SLURM)#

Create a SLURM script (example: run_rnaseq.slurm)

#!/bin/bash
#SBATCH --job-name=rnaseq_nfcore
#SBATCH --partition=batch
#SBATCH --cpus-per-task=4
#SBATCH --mem=16G
#SBATCH --time=02:00:00
#SBATCH --output=rnaseq_%j.log
#SBATCH --error=rnaseq_%j.err

echo "Starting job at $(date)"
echo "Running on node: $(hostname)"
echo "Job ID: $SLURM_JOB_ID"

module load nextflow
module load singularity/4.3.4

genome=/cluster/tufts/workshop/public/2026spring/reference/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
gtf=/cluster/tufts/workshop/public/2026spring/reference/Homo_sapiens.GRCh38.111.gtf.gz

nextflow run nf-core/rnaseq \
    -r 3.22.2 \
    -profile tufts \
    --input samplesheet.csv \
    --fasta $genome \
    --gtf $gtf \
    --star_index /cluster/tufts/workshop/public/2026spring/star_index/  \
    --salmon_index /cluster/tufts/workshop/public/2026spring/salmon_index/
    --outdir rnaseq

# -resume

# --star_index is provided for the workshop to avoid rebuilding the index and save time.
# You do not need this flag when running the pipeline independently.

echo "Finished at $(date)"

Submit job:

sbatch run_rnaseq.slurm

4. Key Results#

If you define your output directory as:

--outdir rnaseq

then all pipeline results will be written inside:

rnaseq/

This directory will contain subfolders such as:

rnaseq/
├── multiqc/
├── star_salmon/
├── pipeline_info/
├── fastqc/
├── trimgalore/
└── work/   (intermediate files, DELETE after COMPLETION to save space)

4.1 Summary Report (Most Important)#

multiqc/
multiqc/star_salmon/multiqc_report.html

Open this first.

4.2 Gene-Level Counts#

salmon.merged.gene_counts.tsv   → gene-level counts (used for differential expression)

salmon.merged.gene_tpm.tsv      → normalized expression (for visualization)

4.3 Pipeline Metadata (Reproducibility)#

pipeline_info/nf_core_rnaseq_software_mqc_versions.yml

Contains software versions

Open OnDemand App#

In your working directory, this file list all parameter used

nf-params.json

5. Logs and Troubleshooting#

5.1 Main Nextflow Log#

.nextflow.log

Monitor during run:

tail -f .nextflow.log

It locates in the workding dir, not the outdir.

5.2 SLURM Log Files#

rnaseq_<jobid>.log   # check progress
rnaseq_<jobid>.err   # check errors

Open OnDemand App#

In session ID folder:

output.log  # check progress

5.3 Per-Process Logs (Advanced)#

Located inside:

work/<hash>/

Files:

.command.log
.command.err
.command.out

6. Running With Your Own Data#

To run your own dataset, you need:

FASTQ files
A properly formatted samplesheet.csv
FASTA reference genome
GTF annotation file

Steps:

Upload FASTQ files to your working directory.
Create a samplesheet.csv.
Adjust reference paths if needed.
Launch the pipeline via Open OnDemand or CLI.

Running nf-core RNA-seq on Open OnDemand and CLI#

Workshop Overview#

1. Prepare Input Files#

1.1 FASTQ Files#

1.2 Sample Sheet (samplesheet.csv)#

1.3 Reference Files#

Remote URLs#

Local example (For workshop use only).#

2. Running via Open OnDemand#

3. Running via CLI (SLURM)#

4. Key Results#

4.1 Summary Report (Most Important)#

4.2 Gene-Level Counts#

4.3 Pipeline Metadata (Reproducibility)#

Open OnDemand App#

5. Logs and Troubleshooting#

5.1 Main Nextflow Log#

5.2 SLURM Log Files#

Open OnDemand App#

5.3 Per-Process Logs (Advanced)#

6. Running With Your Own Data#

1.2 Sample Sheet (`samplesheet.csv`)#