Running nf-core RNA-seq on Open OnDemand and CLI#
2026-02-20
Shirley Li: xue.li37@tufts.edu
Pipeline version used in workshop: nf-core/rnaseq 3.22.2
Workshop Overview#
This hands-on session demonstrates how to run the nf-core/rnaseq pipeline on the Tufts HPC system using two approaches:
Open OnDemand graphical interface
Command-line (CLI) submission via SLURM
Participants will learn how to prepare input files, configure pipeline parameters, submit jobs, and interpret key output results.
This workshop focuses on practical execution and reproducibility rather than RNA-seq theory.
1. Prepare Input Files#
1.1 FASTQ Files#
Place all FASTQ files in your working directory.
Example (paired-end):
SRX1693951_SRR3362661_1_sub.fastq.gz
SRX1693951_SRR3362661_2_sub.fastq.gz
...
SRX1693956_SRR3362666_2_sub.fastq.gz
Our sample fastq file located here:
/cluster/tufts/workshop/public/2026spring/nfcore/fastq/
1.2 Sample Sheet (samplesheet.csv)#
Required format:
sample,fastq_1,fastq_2,strandedness
GFPkd_1,SRX1693951_SRR3362661_1_sub.fastq.gz,SRX1693951_SRR3362661_2_sub.fastq.gz,auto
GFPkd_2,SRX1693952_SRR3362662_1_sub.fastq.gz,SRX1693952_SRR3362662_2_sub.fastq.gz,auto
GFPkd_3,SRX1693953_SRR3362663_1_sub.fastq.gz,SRX1693953_SRR3362663_2_sub.fastq.gz,auto
PRMT5kd_1,SRX1693954_SRR3362664_1_sub.fastq.gz,SRX1693954_SRR3362664_2_sub.fastq.gz,auto
PRMT5kd_2,SRX1693955_SRR3362665_1_sub.fastq.gz,SRX1693955_SRR3362665_2_sub.fastq.gz,auto
PRMT5kd_3,SRX1693956_SRR3362666_1_sub.fastq.gz,SRX1693956_SRR3362666_2_sub.fastq.gz,auto
Notes:
Paired-end files must match correctly.
Paths can be relative or absolute (recommended).
On HPC:
/cluster/tufts/workshop/public/2026spring/nfcore/samplesheet.csv
1.3 Reference Files#
You must provide:
Genome FASTA file
Gene annotation GTF file
Reference files can be either remote (URL) or local (file path on the cluster).
Remote URLs#
Example (Ensembl GRCh38 release 111):
FASTA (Genome):
https://ftp.ensembl.org/pub/release-111/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
GTF (Annotation):
https://ftp.ensembl.org/pub/release-111/gtf/homo_sapiens/Homo_sapiens.GRCh38.111.gtf.gz
Local example (For workshop use only).#
This directory may be deleted after the workshop. Keep your own copy if you plan to rerun the analysis.
FASTA (Genome):
/cluster/tufts/workshop/public/2026spring/star_index/Homo_sapiens.GRCh38.dna.primary_assembly.fa
GTF (Annotation):
/cluster/tufts/workshop/public/2026spring/star_index/Homo_sapiens.GRCh38.111.gtf
2. Running via Open OnDemand#
Log in to:
Open OnDemand → nf-core pipelines → rnaseq (version 3.22.2)
Configure:
Input:
samplesheet.csvFASTA: (URL or local file path listed above)
GTF: (URL or local file path listed above)
Outdir:
rnaseqWorking directory: your project directory
kraken_db:
/cluster/tufts/biocontainers/datasets/kraken2/k2_standard_20251015star_index:
/cluster/tufts/workshop/public/2026spring/star_index/salmon_index:
/cluster/tufts/workshop/public/2026spring/salmon_index/skip_pseudo_alignment: true
Other parameters can remain default unless discussed in workshop.
Submit job.
3. Running via CLI (SLURM)#
Create a SLURM script (example: run_rnaseq.slurm)
#!/bin/bash
#SBATCH --job-name=rnaseq_nfcore
#SBATCH --partition=batch
#SBATCH --cpus-per-task=4
#SBATCH --mem=16G
#SBATCH --time=02:00:00
#SBATCH --output=rnaseq_%j.log
#SBATCH --error=rnaseq_%j.err
echo "Starting job at $(date)"
echo "Running on node: $(hostname)"
echo "Job ID: $SLURM_JOB_ID"
module load nextflow
module load singularity/4.3.4
genome=/cluster/tufts/workshop/public/2026spring/reference/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
gtf=/cluster/tufts/workshop/public/2026spring/reference/Homo_sapiens.GRCh38.111.gtf.gz
nextflow run nf-core/rnaseq \
-r 3.22.2 \
-profile tufts \
--input samplesheet.csv \
--fasta $genome \
--gtf $gtf \
--star_index /cluster/tufts/workshop/public/2026spring/star_index/ \
--salmon_index /cluster/tufts/workshop/public/2026spring/salmon_index/
--outdir rnaseq
# -resume
# --star_index is provided for the workshop to avoid rebuilding the index and save time.
# You do not need this flag when running the pipeline independently.
echo "Finished at $(date)"
Submit job:
sbatch run_rnaseq.slurm
4. Key Results#
If you define your output directory as:
--outdir rnaseq
then all pipeline results will be written inside:
rnaseq/
This directory will contain subfolders such as:
rnaseq/
├── multiqc/
├── star_salmon/
├── pipeline_info/
├── fastqc/
├── trimgalore/
└── work/ (intermediate files, DELETE after COMPLETION to save space)
4.1 Summary Report (Most Important)#
multiqc/
multiqc/star_salmon/multiqc_report.html
Open this first.
4.2 Gene-Level Counts#
salmon.merged.gene_counts.tsv → gene-level counts (used for differential expression)
salmon.merged.gene_tpm.tsv → normalized expression (for visualization)
4.3 Pipeline Metadata (Reproducibility)#
pipeline_info/nf_core_rnaseq_software_mqc_versions.yml
Contains software versions
Open OnDemand App#
In your working directory, this file list all parameter used
nf-params.json
5. Logs and Troubleshooting#
5.1 Main Nextflow Log#
.nextflow.log
Monitor during run:
tail -f .nextflow.log
It locates in the workding dir, not the outdir.
5.2 SLURM Log Files#
rnaseq_<jobid>.log # check progress
rnaseq_<jobid>.err # check errors
Open OnDemand App#
In session ID folder:
output.log # check progress
5.3 Per-Process Logs (Advanced)#
Located inside:
work/<hash>/
Files:
.command.log
.command.err
.command.out
6. Running With Your Own Data#
To run your own dataset, you need:
FASTQ files
A properly formatted
samplesheet.csvFASTA reference genome
GTF annotation file
Steps:
Upload FASTQ files to your working directory.
Create a
samplesheet.csv.Adjust reference paths if needed.
Launch the pipeline via Open OnDemand or CLI.