Spades#
Introduction#
Spades is an assembly toolkit containing various assembly pipelines.
Versions#
3.15.4
3.15.5
If you require a version newer than these two, please visit the SPAdes download and installation page
As of 2024-12-10, the latest version available is 4.0.0.
Commands#
coronaspades.py
metaplasmidspades.py
metaspades.py
metaviralspades.py
plasmidspades.py
rnaspades.py
rnaviralspades.py
spades-bwa
spades-convert-bin-to-fasta
spades-core
spades-corrector-core
spades-gbuilder
spades-gmapper
spades-gsimplifier
spades-hammer
spades_init.py
spades-ionhammer
spades-kmercount
spades-kmer-estimating
spades.py
spades-read-filter
spades-truseq-scfcorrection
truspades.py
Example job#
Adjust slurm options based on job requirements (slurm cheat sheet):
#!/bin/bash
#SBATCH -p partitionName # batch, gpu, preempt, mpi or your group's own partition
#SBATCH -t 1:00:00 # Runtime limit (D-HH:MM:SS)
#SBATCH -N 1 # Number of nodes
#SBATCH -n 1 # Number of tasks per node
#SBATCH -c 4 # Number of CPU cores per task
#SBATCH --mem=8G # Memory required per node
#SBATCH --job-name=spades # Job name
#SBATCH --mail-type=FAIL,BEGIN,END # Send an email when job fails, begins, and finishes
#SBATCH --mail-user=your.email@tufts.edu # Email address for notifications
#SBATCH --error=%x-%J-%u.err # Standard error file: <job_name>-<job_id>-<username>.err
#SBATCH --output=%x-%J-%u.out # Standard output file: <job_name>-<job_id>-<username>.out
module purge
module load spades/XXXX ### you can run *module avail spades* to check all available versions
Using Multi-Threads for SPAdes#
SPAdes is a genome/metagenome assembler designed to efficiently utilize multiple CPU threads. Multi-threading significantly accelerates SPAdes by parallelizing computationally intensive steps like k-mer construction, error correction, and graph traversal.
How Multi-Threading Works in SPAdes:
Parallelization: SPAdes parallelizes tasks like k-mer generation, graph construction, and assembly across the threads specified with
--threads
.Not Fully Parallel: Some parts of SPAdes, such as graph simplification and certain I/O operations, are not fully parallelizable. As a result, doubling the number of threads may not always halve the runtime.
Best Practices for Using Multi-Threads with SPAdes:
- Determine Dataset Size:
Small datasets (e.g., bacterial genomes): 4–8 threads.
Medium datasets (e.g., microbial communities): 8–16 threads.
Large datasets (e.g., human or metagenomic assemblies): 16–32 threads.
- Match SLURM Resources:
Ensure SLURM’s
--cpus-per-task
matches the--threads
argument in SPAdes.
- Monitor Memory Usage:
Check memory usage during assembly to ensure no resource wastage.
- Run Test Assemblies:
For new datasets, run SPAdes with fewer threads initially to estimate runtime and memory requirements.
#!/bin/bash
#SBATCH --job-name=SPAdes_Assembly
#SBATCH --output=spades.%j.out
#SBATCH --error=spades.%j.err
#SBATCH --time=00-24:00:00 # Set appropriate runtime
#SBATCH --cpus-per-task=16 # Match threads to SPAdes --threads
#SBATCH --mem=64G # Allocate sufficient memory
#SBATCH -p batch # Partition to use
module load spades/3.15.5
# Run SPAdes
spades.py --meta \
-1 /path/to/reads_R1.fastq.gz \
-2 /path/to/reads_R2.fastq.gz \
-o /path/to/output \
--threads 16 \
--memory 64
Using multi-threads is crucial for speeding up SPAdes, but it’s important to balance thread count with available memory, I/O capacity, and the complexity of your dataset to ensure efficient and successful assemblies.