A Practical Guide to Spatial Transcriptomics Analysis in R (MERFISH, 10x Visium, GeoMx DSP)#

Author: Shirley Li, xue.li37@tufts.edu
Date: 2025-12-06

1. Overview#

This document introduces three major spatial transcriptomics platforms used in modern biological and biomedical research:

  • Vizgen MERFISH: Single-molecule, imaging-based, single-cell resolved

  • 10x Genomics Visium: Sequencing-based, spot-level whole-transcriptome

  • NanoString GeoMx DSP: Region-of-interest (ROI) based spatial profiling

The focus is on practical application, including:

  • What each platform is used for

  • What types of files they generate

  • How to analyze the data

  • Fully reproducible R-based workflows

This is not a theory-focused document; instead, it is designed to help researchers analyze data on the Tufts HPC environment or locally.

2. Vizgen MERFISH#

2.1 What MERFISH Is#

MERFISH (Multiplexed Error-Robust FISH) from Vizgen is an imaging-based single-cell spatial transcriptomics technology. It detects individual RNA molecules directly in tissue, enabling subcellular precision.

Key characteristics:

  • Single-cell resolution

  • High spatial fidelity

  • Molecule-level detection

  • Customizable gene panels

  • Produces cell-by-gene matrices and molecule coordinates

2.2 Typical Output Files#

cell_by_gene.csv
cell_metadata.csv
molecule_list.csv
fov_positions.json
images/

The primary data for analysis:

  • cell_by_gene.csv: counts matrix

  • cell_metadata.csv: x/y positions, cell segmentation, QC metrics

2.3 MERFISH Analysis Workflow in R#

Load required libraries#

library(Seurat)
library(dplyr)
library(readr)
library(ggplot2)

Step 1. Load counts and metadata#

counts <- read_csv("cell_by_gene.csv")
counts <- as.data.frame(counts)
rownames(counts) <- counts$cell
counts$cell <- NULL
counts <- t(counts)

metadata <- read_csv("cell_metadata.csv")
rownames(metadata) <- metadata$cell

obj <- CreateSeuratObject(counts = counts, meta.data = metadata)

Step 2. QC and filtering#

obj[["percent.mt"]] <- PercentageFeatureSet(obj, pattern = "^mt-")

obj <- subset(
    obj,
    subset = nFeature_RNA > 30 &
             nCount_RNA > 500 &
             percent.mt < 25
)

Step 3. Normalization and scaling#

obj <- NormalizeData(obj)
obj <- FindVariableFeatures(obj)
obj <- ScaleData(obj)

Step 4. PCA, neighbors, clustering#

obj <- RunPCA(obj)
obj <- FindNeighbors(obj, dims = 1:20)
obj <- FindClusters(obj)
obj <- RunUMAP(obj, dims = 1:20)
DimPlot(obj, reduction = "umap")

Step 5. Spatial visualization#

Requires x and y coordinates in metadata.

ggplot(obj@meta.data, aes(x = x, y = y, color = seurat_clusters)) +
    geom_point(size = 0.6) +
    coord_fixed() +
    theme_bw()

Step 6. Marker genes and annotation#

markers <- FindAllMarkers(obj)

Step 7. Neighborhood analysis#

Example using nearest neighbors.

library(FNN)
coords <- obj@meta.data[, c("x", "y")]
nn <- get.knn(coords, k = 10)

3. 10x Genomics Visium#

3.1 What Visium Is#

10x Visium is a sequencing-based spatial transcriptomics system. mRNA is captured on spatially barcoded spots (~55 µm), processed through Space Ranger, and analyzed via R or Python.

Key characteristics:

  • Whole-transcriptome profiling

  • Multiple cells per spot

  • Supports FFPE and fresh frozen samples

  • Strong for tissue-level discovery and regional differences

3.2 Output Files from Space Ranger#

filtered_feature_bc_matrix/
spatial/
analysis/

Files used for analysis:

  • matrix.mtx.gz

  • features.tsv.gz

  • barcodes.tsv.gz

  • tissue_positions_list.csv

  • high-resolution H&E image

3.3 Visium Analysis Workflow in R#

Step 1. Load data#

library(Seurat)

obj <- Load10X_Spatial(data.dir = "path/to/visium_folder")

Step 2. QC#

obj[["percent.mt"]] <- PercentageFeatureSet(obj, pattern = "^MT-")

obj <- subset(
    obj,
    subset = nCount_Spatial > 500 &
             nFeature_Spatial > 200 &
             percent.mt < 25
)

Step 3. Normalization and clustering#

obj <- SCTransform(obj, assay = "Spatial", verbose = FALSE)
obj <- RunPCA(obj)
obj <- FindNeighbors(obj, dims = 1:20)
obj <- FindClusters(obj)
obj <- RunUMAP(obj, dims = 1:20)

SpatialDimPlot(obj, label = TRUE)

Step 4. Differential expression#

markers <- FindAllMarkers(obj)

Step 5. Spatially variable features#

svg <- FindSpatiallyVariableFeatures(obj, assay = "SCT")

Step 6. Cell type deconvolution (example: SPOTlight)#

library(SPOTlight)

# sc is a Seurat object with scRNA-seq reference
res <- spotlight_deconvolution(se_sc = sc, st_obj = obj)

Step 7. Pathway enrichment#

library(clusterProfiler)
library(org.Hs.eg.db)

genes <- markers$gene[markers$cluster == "0"]

enrich <- enrichGO(
    gene = genes,
    OrgDb = org.Hs.eg.db,
    keyType = "SYMBOL"
)

4. NanoString GeoMx DSP#

4.1 What GeoMx DSP Is#

GeoMx DSP is an ROI-based spatial profiling technology used widely in translational and pathology settings. ROIs are defined using immunofluorescence, and gene expression is collected via UV cleavage of barcoded probes.

Key characteristics:

  • ROI-level profiling, not single-cell

  • Works well with FFPE

  • Supports whole transcriptome and targeted panels

  • Ideal for comparisons between regions, tumor microenvironment, inflamed vs non-inflamed areas

4.2 Typical GeoMx Files#

exprMat.txt
metadata.txt
panel_annotation.txt
QC_stats/
image_tiles/

4.3 GeoMx Analysis Workflow in R#

Step 1. Load data#

library(GeoMxTools)

obj <- readNanoStringGeoMxSet(
    dccFiles = "path/to/dcc/",
    pkcFiles = "path/to/pkc/",
    phenoDataFile = "metadata.txt",
    exprsFile = "exprMat.txt"
)

Step 2. QC#

obj <- shiftCountsOne(obj)
obj <- setSegmentQCFlags(obj)
obj <- setGeneQCFlags(obj)

obj <- obj[, obj$SegmentQC == "PASS"]

Step 3. Normalization#

obj <- normalize(obj, norm_method = "quant")

Step 4. PCA#

obj <- runPCA(obj)
plotPCA(obj)

Step 5. Differential expression via limma#

library(limma)

design <- model.matrix(~ group, data = pData(obj))
fit <- lmFit(exprs(obj), design)
fit <- eBayes(fit)

topTable(fit, coef = 2)

Step 6. Pathway analysis#

library(clusterProfiler)
genes <- rownames(topTable(fit, coef = 2, number = Inf, p.value = 0.05))

enrichGO(
    gene = genes,
    OrgDb = org.Hs.eg.db,
    keyType = "SYMBOL"
)

6. Notes#

  • All provided code is compatible with SLURM workloads on the Tufts HPC.

  • Use the latest Open OnDemand RStudio session for interactive development.

  • We recommend using Seurat v5 for spatial transcriptomics pipelines.

  • For MERFISH or Visium analyses, request 16–32 CPUs as a start. Always pair with sufficient memory (–mem=64G or more).

  • Data import and QC can be time-consuming. Save your Seurat (or other) object to an .rds file and continue from the saved object to avoid re-running preprocessing.