# Running nf-core RNA-seq on Open OnDemand and CLI 2026-02-20 Shirley Li: xue.li37@tufts.edu Pipeline version used in workshop: [**nf-core/rnaseq 3.22.2**](https://nf-co.re/rnaseq/3.22.2/) # Workshop Overview This hands-on session demonstrates how to run the **nf-core/rnaseq** pipeline on the Tufts HPC system using two approaches: - Open OnDemand graphical interface - Command-line (CLI) submission via SLURM Participants will learn how to prepare input files, configure pipeline parameters, submit jobs, and interpret key output results. This workshop focuses on practical execution and reproducibility rather than RNA-seq theory. # 1. Prepare Input Files ## 1.1 FASTQ Files Place all FASTQ files in your working directory. Example (paired-end): ``` SRX1693951_SRR3362661_1_sub.fastq.gz SRX1693951_SRR3362661_2_sub.fastq.gz ... SRX1693956_SRR3362666_2_sub.fastq.gz ``` Our sample fastq file located here: ``` /cluster/tufts/workshop/public/2026spring/nfcore/fastq/ ``` ## 1.2 Sample Sheet (`samplesheet.csv`) Required format: ``` sample,fastq_1,fastq_2,strandedness GFPkd_1,SRX1693951_SRR3362661_1_sub.fastq.gz,SRX1693951_SRR3362661_2_sub.fastq.gz,auto GFPkd_2,SRX1693952_SRR3362662_1_sub.fastq.gz,SRX1693952_SRR3362662_2_sub.fastq.gz,auto GFPkd_3,SRX1693953_SRR3362663_1_sub.fastq.gz,SRX1693953_SRR3362663_2_sub.fastq.gz,auto PRMT5kd_1,SRX1693954_SRR3362664_1_sub.fastq.gz,SRX1693954_SRR3362664_2_sub.fastq.gz,auto PRMT5kd_2,SRX1693955_SRR3362665_1_sub.fastq.gz,SRX1693955_SRR3362665_2_sub.fastq.gz,auto PRMT5kd_3,SRX1693956_SRR3362666_1_sub.fastq.gz,SRX1693956_SRR3362666_2_sub.fastq.gz,auto ``` Notes: - Paired-end files must match correctly. - Paths can be relative or absolute (recommended). On HPC: ``` /cluster/tufts/workshop/public/2026spring/nfcore/samplesheet.csv ``` ## 1.3 Reference Files You must provide: - Genome FASTA file - Gene annotation GTF file Reference files can be either **remote (URL)** or **local (file path on the cluster)**. ### Remote URLs Example (Ensembl GRCh38 release 111): FASTA (Genome): ``` https://ftp.ensembl.org/pub/release-111/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz ``` GTF (Annotation): ``` https://ftp.ensembl.org/pub/release-111/gtf/homo_sapiens/Homo_sapiens.GRCh38.111.gtf.gz ``` ### Local example (For workshop use only). **This directory may be deleted after the workshop. Keep your own copy if you plan to rerun the analysis.** FASTA (Genome): ``` /cluster/tufts/workshop/public/2026spring/star_index/Homo_sapiens.GRCh38.dna.primary_assembly.fa ``` GTF (Annotation): ``` /cluster/tufts/workshop/public/2026spring/star_index/Homo_sapiens.GRCh38.111.gtf ``` # 2. Running via Open OnDemand Log in to: Open OnDemand → nf-core pipelines → rnaseq (version 3.22.2) Configure: - Input: `samplesheet.csv` - FASTA: (URL or local file path listed above) - GTF: (URL or local file path listed above) - Outdir: `rnaseq` - Working directory: your project directory - kraken_db: `/cluster/tufts/biocontainers/datasets/kraken2/k2_standard_20251015` - star_index: `/cluster/tufts/workshop/public/2026spring/star_index/ ` - salmon_index: `/cluster/tufts/workshop/public/2026spring/salmon_index/ ` - skip_pseudo_alignment: true Other parameters can remain default unless discussed in workshop. Submit job. # 3. Running via CLI (SLURM) Create a SLURM script (example: `run_rnaseq.slurm`) ``` #!/bin/bash #SBATCH --job-name=rnaseq_nfcore #SBATCH --partition=batch #SBATCH --cpus-per-task=4 #SBATCH --mem=16G #SBATCH --time=02:00:00 #SBATCH --output=rnaseq_%j.log #SBATCH --error=rnaseq_%j.err echo "Starting job at $(date)" echo "Running on node: $(hostname)" echo "Job ID: $SLURM_JOB_ID" module load nextflow module load singularity/4.3.4 genome=/cluster/tufts/workshop/public/2026spring/reference/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz gtf=/cluster/tufts/workshop/public/2026spring/reference/Homo_sapiens.GRCh38.111.gtf.gz nextflow run nf-core/rnaseq \ -r 3.22.2 \ -profile tufts \ --input samplesheet.csv \ --fasta $genome \ --gtf $gtf \ --star_index /cluster/tufts/workshop/public/2026spring/star_index/ \ --salmon_index /cluster/tufts/workshop/public/2026spring/salmon_index/ --outdir rnaseq # -resume # --star_index is provided for the workshop to avoid rebuilding the index and save time. # You do not need this flag when running the pipeline independently. echo "Finished at $(date)" ``` Submit job: ``` sbatch run_rnaseq.slurm ``` # 4. Key Results If you define your output directory as: ``` --outdir rnaseq ``` then all pipeline results will be written inside: ``` rnaseq/ ``` This directory will contain subfolders such as: ``` rnaseq/ ├── multiqc/ ├── star_salmon/ ├── pipeline_info/ ├── fastqc/ ├── trimgalore/ └── work/ (intermediate files, DELETE after COMPLETION to save space) ``` ## 4.1 Summary Report (Most Important) ``` multiqc/ multiqc/star_salmon/multiqc_report.html ``` Open this first. ## 4.2 Gene-Level Counts ``` salmon.merged.gene_counts.tsv → gene-level counts (used for differential expression) ``` ``` salmon.merged.gene_tpm.tsv → normalized expression (for visualization) ``` ## 4.3 Pipeline Metadata (Reproducibility) ``` pipeline_info/nf_core_rnaseq_software_mqc_versions.yml ``` Contains software versions ### Open OnDemand App In your working directory, this file list all parameter used ``` nf-params.json ``` # 5. Logs and Troubleshooting ## 5.1 Main Nextflow Log ``` .nextflow.log ``` Monitor during run: ``` tail -f .nextflow.log ``` It locates in the workding dir, not the outdir. ## 5.2 SLURM Log Files ``` rnaseq_.log # check progress rnaseq_.err # check errors ``` ### Open OnDemand App In session ID folder: ``` output.log # check progress ``` ## 5.3 Per-Process Logs (Advanced) Located inside: ``` work// ``` Files: ``` .command.log .command.err .command.out ``` # 6. Running With Your Own Data To run your own dataset, you need: - FASTQ files - A properly formatted `samplesheet.csv` - FASTA reference genome - GTF annotation file Steps: 1. Upload FASTQ files to your working directory. 1. Create a `samplesheet.csv`. 1. Adjust reference paths if needed. 1. Launch the pipeline via Open OnDemand or CLI.