A Practical Guide to Spatial Transcriptomics Analysis in R (MERFISH, 10x Visium, GeoMx DSP)#
Author: Shirley Li, xue.li37@tufts.edu
Date: 2025-12-06
1. Overview#
This document introduces three major spatial transcriptomics platforms used in modern biological and biomedical research:
Vizgen MERFISH: Single-molecule, imaging-based, single-cell resolved
10x Genomics Visium: Sequencing-based, spot-level whole-transcriptome
NanoString GeoMx DSP: Region-of-interest (ROI) based spatial profiling
The focus is on practical application, including:
What each platform is used for
What types of files they generate
How to analyze the data
Fully reproducible R-based workflows
This is not a theory-focused document; instead, it is designed to help researchers analyze data on the Tufts HPC environment or locally.
2. Vizgen MERFISH#
2.1 What MERFISH Is#
MERFISH (Multiplexed Error-Robust FISH) from Vizgen is an imaging-based single-cell spatial transcriptomics technology. It detects individual RNA molecules directly in tissue, enabling subcellular precision.
Key characteristics:
Single-cell resolution
High spatial fidelity
Molecule-level detection
Customizable gene panels
Produces cell-by-gene matrices and molecule coordinates
2.2 Typical Output Files#
cell_by_gene.csv
cell_metadata.csv
molecule_list.csv
fov_positions.json
images/
The primary data for analysis:
cell_by_gene.csv: counts matrixcell_metadata.csv: x/y positions, cell segmentation, QC metrics
2.3 MERFISH Analysis Workflow in R#
Load required libraries#
library(Seurat)
library(dplyr)
library(readr)
library(ggplot2)
Step 1. Load counts and metadata#
counts <- read_csv("cell_by_gene.csv")
counts <- as.data.frame(counts)
rownames(counts) <- counts$cell
counts$cell <- NULL
counts <- t(counts)
metadata <- read_csv("cell_metadata.csv")
rownames(metadata) <- metadata$cell
obj <- CreateSeuratObject(counts = counts, meta.data = metadata)
Step 2. QC and filtering#
obj[["percent.mt"]] <- PercentageFeatureSet(obj, pattern = "^mt-")
obj <- subset(
obj,
subset = nFeature_RNA > 30 &
nCount_RNA > 500 &
percent.mt < 25
)
Step 3. Normalization and scaling#
obj <- NormalizeData(obj)
obj <- FindVariableFeatures(obj)
obj <- ScaleData(obj)
Step 4. PCA, neighbors, clustering#
obj <- RunPCA(obj)
obj <- FindNeighbors(obj, dims = 1:20)
obj <- FindClusters(obj)
obj <- RunUMAP(obj, dims = 1:20)
DimPlot(obj, reduction = "umap")
Step 5. Spatial visualization#
Requires x and y coordinates in metadata.
ggplot(obj@meta.data, aes(x = x, y = y, color = seurat_clusters)) +
geom_point(size = 0.6) +
coord_fixed() +
theme_bw()
Step 6. Marker genes and annotation#
markers <- FindAllMarkers(obj)
Step 7. Neighborhood analysis#
Example using nearest neighbors.
library(FNN)
coords <- obj@meta.data[, c("x", "y")]
nn <- get.knn(coords, k = 10)
3. 10x Genomics Visium#
3.1 What Visium Is#
10x Visium is a sequencing-based spatial transcriptomics system. mRNA is captured on spatially barcoded spots (~55 µm), processed through Space Ranger, and analyzed via R or Python.
Key characteristics:
Whole-transcriptome profiling
Multiple cells per spot
Supports FFPE and fresh frozen samples
Strong for tissue-level discovery and regional differences
3.2 Output Files from Space Ranger#
filtered_feature_bc_matrix/
spatial/
analysis/
Files used for analysis:
matrix.mtx.gzfeatures.tsv.gzbarcodes.tsv.gztissue_positions_list.csvhigh-resolution H&E image
3.3 Visium Analysis Workflow in R#
Step 1. Load data#
library(Seurat)
obj <- Load10X_Spatial(data.dir = "path/to/visium_folder")
Step 2. QC#
obj[["percent.mt"]] <- PercentageFeatureSet(obj, pattern = "^MT-")
obj <- subset(
obj,
subset = nCount_Spatial > 500 &
nFeature_Spatial > 200 &
percent.mt < 25
)
Step 3. Normalization and clustering#
obj <- SCTransform(obj, assay = "Spatial", verbose = FALSE)
obj <- RunPCA(obj)
obj <- FindNeighbors(obj, dims = 1:20)
obj <- FindClusters(obj)
obj <- RunUMAP(obj, dims = 1:20)
SpatialDimPlot(obj, label = TRUE)
Step 4. Differential expression#
markers <- FindAllMarkers(obj)
Step 5. Spatially variable features#
svg <- FindSpatiallyVariableFeatures(obj, assay = "SCT")
Step 6. Cell type deconvolution (example: SPOTlight)#
library(SPOTlight)
# sc is a Seurat object with scRNA-seq reference
res <- spotlight_deconvolution(se_sc = sc, st_obj = obj)
Step 7. Pathway enrichment#
library(clusterProfiler)
library(org.Hs.eg.db)
genes <- markers$gene[markers$cluster == "0"]
enrich <- enrichGO(
gene = genes,
OrgDb = org.Hs.eg.db,
keyType = "SYMBOL"
)
4. NanoString GeoMx DSP#
4.1 What GeoMx DSP Is#
GeoMx DSP is an ROI-based spatial profiling technology used widely in translational and pathology settings. ROIs are defined using immunofluorescence, and gene expression is collected via UV cleavage of barcoded probes.
Key characteristics:
ROI-level profiling, not single-cell
Works well with FFPE
Supports whole transcriptome and targeted panels
Ideal for comparisons between regions, tumor microenvironment, inflamed vs non-inflamed areas
4.2 Typical GeoMx Files#
exprMat.txt
metadata.txt
panel_annotation.txt
QC_stats/
image_tiles/
4.3 GeoMx Analysis Workflow in R#
Step 1. Load data#
library(GeoMxTools)
obj <- readNanoStringGeoMxSet(
dccFiles = "path/to/dcc/",
pkcFiles = "path/to/pkc/",
phenoDataFile = "metadata.txt",
exprsFile = "exprMat.txt"
)
Step 2. QC#
obj <- shiftCountsOne(obj)
obj <- setSegmentQCFlags(obj)
obj <- setGeneQCFlags(obj)
obj <- obj[, obj$SegmentQC == "PASS"]
Step 3. Normalization#
obj <- normalize(obj, norm_method = "quant")
Step 4. PCA#
obj <- runPCA(obj)
plotPCA(obj)
Step 5. Differential expression via limma#
library(limma)
design <- model.matrix(~ group, data = pData(obj))
fit <- lmFit(exprs(obj), design)
fit <- eBayes(fit)
topTable(fit, coef = 2)
Step 6. Pathway analysis#
library(clusterProfiler)
genes <- rownames(topTable(fit, coef = 2, number = Inf, p.value = 0.05))
enrichGO(
gene = genes,
OrgDb = org.Hs.eg.db,
keyType = "SYMBOL"
)
5. Summary of Recommended Tools#
Platform |
Resolution |
Primary Tools ® |
Typical Analyses |
|---|---|---|---|
MERFISH |
Single-cell/subcellular |
Seurat, ggplot2, FNN |
Cell classification, spatial visualization, neighborhoods |
Visium |
Spot-level |
Seurat, SPOTlight, clusterProfiler |
Clustering, deconvolution, spatially variable genes |
GeoMx DSP |
ROI-level |
GeoMxTools, limma, clusterProfiler |
Region comparisons, FFPE profiling, pathway analysis |
6. Notes#
All provided code is compatible with SLURM workloads on the Tufts HPC.
Use the latest Open OnDemand RStudio session for interactive development.
We recommend using Seurat v5 for spatial transcriptomics pipelines.
For MERFISH or Visium analyses, request 16–32 CPUs as a start. Always pair with sufficient memory (–mem=64G or more).
Data import and QC can be time-consuming. Save your Seurat (or other) object to an .rds file and continue from the saved object to avoid re-running preprocessing.