# A Practical Guide to Spatial Transcriptomics Analysis in R (MERFISH, 10x Visium, GeoMx DSP) Author: Shirley Li, xue.li37@tufts.edu\ Date: 2025-12-06 ## 1. Overview This document introduces three major spatial transcriptomics platforms used in modern biological and biomedical research: - **Vizgen MERFISH**: Single-molecule, imaging-based, single-cell resolved - **10x Genomics Visium**: Sequencing-based, spot-level whole-transcriptome - **NanoString GeoMx DSP**: Region-of-interest (ROI) based spatial profiling The focus is on **practical application**, including: - What each platform is used for - What types of files they generate - How to analyze the data - Fully reproducible **R-based workflows** This is not a theory-focused document; instead, it is designed to help researchers analyze data on the Tufts HPC environment or locally. # 2. Vizgen MERFISH ## 2.1 What MERFISH Is MERFISH (Multiplexed Error-Robust FISH) from Vizgen is an **imaging-based single-cell spatial transcriptomics** technology. It detects individual RNA molecules directly in tissue, enabling subcellular precision. Key characteristics: - Single-cell resolution - High spatial fidelity - Molecule-level detection - Customizable gene panels - Produces cell-by-gene matrices and molecule coordinates ## 2.2 Typical Output Files ``` cell_by_gene.csv cell_metadata.csv molecule_list.csv fov_positions.json images/ ``` The primary data for analysis: - `cell_by_gene.csv`: counts matrix - `cell_metadata.csv`: x/y positions, cell segmentation, QC metrics ## 2.3 MERFISH Analysis Workflow in R ### Load required libraries ``` library(Seurat) library(dplyr) library(readr) library(ggplot2) ``` ### Step 1. Load counts and metadata ``` counts <- read_csv("cell_by_gene.csv") counts <- as.data.frame(counts) rownames(counts) <- counts$cell counts$cell <- NULL counts <- t(counts) metadata <- read_csv("cell_metadata.csv") rownames(metadata) <- metadata$cell obj <- CreateSeuratObject(counts = counts, meta.data = metadata) ``` ### Step 2. QC and filtering ``` obj[["percent.mt"]] <- PercentageFeatureSet(obj, pattern = "^mt-") obj <- subset( obj, subset = nFeature_RNA > 30 & nCount_RNA > 500 & percent.mt < 25 ) ``` ### Step 3. Normalization and scaling ``` obj <- NormalizeData(obj) obj <- FindVariableFeatures(obj) obj <- ScaleData(obj) ``` ### Step 4. PCA, neighbors, clustering ``` obj <- RunPCA(obj) obj <- FindNeighbors(obj, dims = 1:20) obj <- FindClusters(obj) obj <- RunUMAP(obj, dims = 1:20) DimPlot(obj, reduction = "umap") ``` ### Step 5. Spatial visualization Requires `x` and `y` coordinates in metadata. ``` ggplot(obj@meta.data, aes(x = x, y = y, color = seurat_clusters)) + geom_point(size = 0.6) + coord_fixed() + theme_bw() ``` ### Step 6. Marker genes and annotation ``` markers <- FindAllMarkers(obj) ``` ### Step 7. Neighborhood analysis Example using nearest neighbors. ``` library(FNN) coords <- obj@meta.data[, c("x", "y")] nn <- get.knn(coords, k = 10) ``` # 3. 10x Genomics Visium ## 3.1 What Visium Is 10x Visium is a **sequencing-based spatial transcriptomics** system. mRNA is captured on spatially barcoded spots (~55 µm), processed through Space Ranger, and analyzed via R or Python. Key characteristics: - Whole-transcriptome profiling - Multiple cells per spot - Supports FFPE and fresh frozen samples - Strong for tissue-level discovery and regional differences ## 3.2 Output Files from Space Ranger ``` filtered_feature_bc_matrix/ spatial/ analysis/ ``` Files used for analysis: - `matrix.mtx.gz` - `features.tsv.gz` - `barcodes.tsv.gz` - `tissue_positions_list.csv` - high-resolution H&E image ## 3.3 Visium Analysis Workflow in R ### Step 1. Load data ``` library(Seurat) obj <- Load10X_Spatial(data.dir = "path/to/visium_folder") ``` ### Step 2. QC ``` obj[["percent.mt"]] <- PercentageFeatureSet(obj, pattern = "^MT-") obj <- subset( obj, subset = nCount_Spatial > 500 & nFeature_Spatial > 200 & percent.mt < 25 ) ``` ### Step 3. Normalization and clustering ``` obj <- SCTransform(obj, assay = "Spatial", verbose = FALSE) obj <- RunPCA(obj) obj <- FindNeighbors(obj, dims = 1:20) obj <- FindClusters(obj) obj <- RunUMAP(obj, dims = 1:20) SpatialDimPlot(obj, label = TRUE) ``` ### Step 4. Differential expression ``` markers <- FindAllMarkers(obj) ``` ### Step 5. Spatially variable features ``` svg <- FindSpatiallyVariableFeatures(obj, assay = "SCT") ``` ### Step 6. Cell type deconvolution (example: SPOTlight) ``` library(SPOTlight) # sc is a Seurat object with scRNA-seq reference res <- spotlight_deconvolution(se_sc = sc, st_obj = obj) ``` ### Step 7. Pathway enrichment ``` library(clusterProfiler) library(org.Hs.eg.db) genes <- markers$gene[markers$cluster == "0"] enrich <- enrichGO( gene = genes, OrgDb = org.Hs.eg.db, keyType = "SYMBOL" ) ``` # 4. NanoString GeoMx DSP ## 4.1 What GeoMx DSP Is GeoMx DSP is an ROI-based spatial profiling technology used widely in translational and pathology settings. ROIs are defined using immunofluorescence, and gene expression is collected via UV cleavage of barcoded probes. Key characteristics: - ROI-level profiling, not single-cell - Works well with FFPE - Supports whole transcriptome and targeted panels - Ideal for comparisons between regions, tumor microenvironment, inflamed vs non-inflamed areas ## 4.2 Typical GeoMx Files ``` exprMat.txt metadata.txt panel_annotation.txt QC_stats/ image_tiles/ ``` ## 4.3 GeoMx Analysis Workflow in R ### Step 1. Load data ``` library(GeoMxTools) obj <- readNanoStringGeoMxSet( dccFiles = "path/to/dcc/", pkcFiles = "path/to/pkc/", phenoDataFile = "metadata.txt", exprsFile = "exprMat.txt" ) ``` ### Step 2. QC ``` obj <- shiftCountsOne(obj) obj <- setSegmentQCFlags(obj) obj <- setGeneQCFlags(obj) obj <- obj[, obj$SegmentQC == "PASS"] ``` ### Step 3. Normalization ``` obj <- normalize(obj, norm_method = "quant") ``` ### Step 4. PCA ``` obj <- runPCA(obj) plotPCA(obj) ``` ### Step 5. Differential expression via limma ``` library(limma) design <- model.matrix(~ group, data = pData(obj)) fit <- lmFit(exprs(obj), design) fit <- eBayes(fit) topTable(fit, coef = 2) ``` ### Step 6. Pathway analysis ``` library(clusterProfiler) genes <- rownames(topTable(fit, coef = 2, number = Inf, p.value = 0.05)) enrichGO( gene = genes, OrgDb = org.Hs.eg.db, keyType = "SYMBOL" ) ``` # 5. Summary of Recommended Tools | Platform | Resolution | Primary Tools (R) | Typical Analyses | | --------- | ----------------------- | ---------------------------------- | --------------------------------------------------------- | | MERFISH | Single-cell/subcellular | Seurat, ggplot2, FNN | Cell classification, spatial visualization, neighborhoods | | Visium | Spot-level | Seurat, SPOTlight, clusterProfiler | Clustering, deconvolution, spatially variable genes | | GeoMx DSP | ROI-level | GeoMxTools, limma, clusterProfiler | Region comparisons, FFPE profiling, pathway analysis | # 6. Notes - All provided code is compatible with SLURM workloads on the Tufts HPC. - Use the latest Open OnDemand RStudio session for interactive development. - We recommend using Seurat v5 for spatial transcriptomics pipelines. - For MERFISH or Visium analyses, request 16–32 CPUs as a start. Always pair with sufficient memory (--mem=64G or more). - Data import and QC can be time-consuming. Save your Seurat (or other) object to an .rds file and continue from the saved object to avoid re-running preprocessing.