Spades#

Introduction#

Spades is an assembly toolkit containing various assembly pipelines.

Versions#

  • 3.15.4

  • 3.15.5

If you require a version newer than these two, please visit the SPAdes download and installation page

As of 2024-12-10, the latest version available is 4.0.0.

Commands#

  • coronaspades.py

  • metaplasmidspades.py

  • metaspades.py

  • metaviralspades.py

  • plasmidspades.py

  • rnaspades.py

  • rnaviralspades.py

  • spades-bwa

  • spades-convert-bin-to-fasta

  • spades-core

  • spades-corrector-core

  • spades-gbuilder

  • spades-gmapper

  • spades-gsimplifier

  • spades-hammer

  • spades_init.py

  • spades-ionhammer

  • spades-kmercount

  • spades-kmer-estimating

  • spades.py

  • spades-read-filter

  • spades-truseq-scfcorrection

  • truspades.py

Example job#

Adjust slurm options based on job requirements (slurm cheat sheet):

#!/bin/bash
#SBATCH -p partitionName  # batch, gpu, preempt, mpi or your group's own partition
#SBATCH -t 1:00:00  # Runtime limit (D-HH:MM:SS)
#SBATCH -N 1   # Number of nodes
#SBATCH -n 1   # Number of tasks per node
#SBATCH -c 4   # Number of CPU cores per task
#SBATCH --mem=8G       # Memory required per node
#SBATCH --job-name=spades      # Job name
#SBATCH --mail-type=FAIL,BEGIN,END     # Send an email when job fails, begins, and finishes
#SBATCH --mail-user=your.email@tufts.edu       # Email address for notifications
#SBATCH --error=%x-%J-%u.err   # Standard error file: <job_name>-<job_id>-<username>.err
#SBATCH --output=%x-%J-%u.out  # Standard output file: <job_name>-<job_id>-<username>.out

module purge
module load spades/XXXX ### you can run *module avail spades* to check all available versions

Using Multi-Threads for SPAdes#

SPAdes is a genome/metagenome assembler designed to efficiently utilize multiple CPU threads. Multi-threading significantly accelerates SPAdes by parallelizing computationally intensive steps like k-mer construction, error correction, and graph traversal.

How Multi-Threading Works in SPAdes:

  • Parallelization: SPAdes parallelizes tasks like k-mer generation, graph construction, and assembly across the threads specified with --threads.

  • Not Fully Parallel: Some parts of SPAdes, such as graph simplification and certain I/O operations, are not fully parallelizable. As a result, doubling the number of threads may not always halve the runtime.

Best Practices for Using Multi-Threads with SPAdes:

  • Determine Dataset Size:
    • Small datasets (e.g., bacterial genomes): 4–8 threads.

    • Medium datasets (e.g., microbial communities): 8–16 threads.

    • Large datasets (e.g., human or metagenomic assemblies): 16–32 threads.

  • Match SLURM Resources:
    • Ensure SLURM’s --cpus-per-task matches the --threads argument in SPAdes.

  • Monitor Memory Usage:
    • Check memory usage during assembly to ensure no resource wastage.

  • Run Test Assemblies:
    • For new datasets, run SPAdes with fewer threads initially to estimate runtime and memory requirements.

#!/bin/bash
#SBATCH --job-name=SPAdes_Assembly
#SBATCH --output=spades.%j.out
#SBATCH --error=spades.%j.err
#SBATCH --time=00-24:00:00  # Set appropriate runtime
#SBATCH --cpus-per-task=16  # Match threads to SPAdes --threads
#SBATCH --mem=64G           # Allocate sufficient memory
#SBATCH -p batch            # Partition to use

module load spades/3.15.5

# Run SPAdes
spades.py --meta \
       -1 /path/to/reads_R1.fastq.gz \
       -2 /path/to/reads_R2.fastq.gz \
       -o /path/to/output \
       --threads 16 \
       --memory 64

Using multi-threads is crucial for speeding up SPAdes, but it’s important to balance thread count with available memory, I/O capacity, and the complexity of your dataset to ensure efficient and successful assemblies.