Spatial transcriptomics • numbat

Introduction

Spatial transcriptomics (ST) assays provide an ideal opportunity to characterize the spatial architecture of a tumor as well as the cell-cell interactions within the tumor microenvironment. From transcriptomic information, CNV analysis can be applied to resolve genetic subclones in a spatial context [1,2]. In particular, many sequencing-based ST assays (e.g. Visium for fresh frozen and Slide-seq) capture both expression and allele information, and are thus amenable to allele-specific CNV analysis.

Unlike single-cell sequencing, measurements from spatial assays are often from a mixture cells (i.e. “spots”). Such mixtures can potentially consist of tumor and normal cells (stromal/immune) as well as different tumor subclones with distinct genotypes. Originally developed for scRNA-seq data, Numbat makes several assumptions in the model based on the single-cell nature of the measurements. However, if we are willing to accept the assumption that most spots in ST only contain cells from the same clone (genetic homogeneity within a spot), then Numbat is applicable to ST data. Whether this assumption is reasonable likely depends on the technology (for example, Visium spots typically contain 1-10 cells, whereas Slide-seqV2 is near single-cell resolution) as well as tissue architecture (some tissues are more well-mixed than others). It is worth noting that existing spatial CNV calling methods such as STARCH [3] also make the same assumption. Furthermore, future generations of ST assays will likely achieve single-cell resolution very soon.

As 10x Visium is one of the most widely used ST platforms, this vignette is dedicated to the application of Numbat to perform allele-specific CNV analysis on Visium data. Recently published data from Barkely et al [1] (downloadable at GSE203612) will be used for illustration.

Preparing data

Because outputs from 10x SpaceRanger (BAM, barcode list, and gene expression quantification) are in the same format as CellRanger, the same pileup-and-phase procedure can be used to prepare the allele counts and expression count matrices for Visium data as scRNA-seq. Compared to scRNA-seq, the overall SNP coverage from Visium is typically more sparse, which may impact phasing quality. Therefore, a priori genotyping by DNA or co-genotyping with scRNA-seq (via multi-sample mode of pileup-and-phase) would be especially useful.

In terms of expression references, reasonable choices include Visium or scRNA-seq data of matching normal tissues (e.g. normal prostate as reference for prostate cancer samples). For Visium data, one can first cluster the spots (e.g. using pagoda2) and then collapse the counts by clusters (using numbat::aggregate_counts). For Visium, other Visium data usually make a better reference than scRNA-seq data, achieving the least amount of noise. For this anlaysis, I just used clustered spots from UCEC3 as reference, since that sample has no aneuploidy.

Running analysis

The recommended parameters for Visium are similar to those for scRNA-seq. The only difference is that, because allele coverage per barcode is typically sparser than scRNA-seq, max_entropy (which quantifies the uncertainty in CNV posterior in individual cells/spots) should be set to a higher value such as 0.8 (default is 0.5).

Results

Aneuploidy probabilities

A major task in ST analysis is delineating the boundaries of malignant and normal cells. Aggregating all CNV evidence, Numbat outputs a mutant versus normal probability (clone_post$p_cnv). This provides a natural interpretation for spots that are a mixture of mutant and normal cells, which should show a CNV probability around 0.5 (the specific value depends on the mixture fraction). Spots that contain purely mutant cells should show a high CNV probability (near 1) and those that contain purely normal cells should show a low CNV probability (near 0).

Let’s first look at the mutant versus normal probabilities of a few samples.

library(data.table)
library(dplyr)
library(glue)
library(stringr)
library(numbat)
library(ggplot2)

# read in the numbat results and spot coordinates
nb = list()
spots = list()
samples = c('OVCA1', 'OVCA3', 'BRCA1', 'GIST1')
for (sample in samples) {
    nb[[sample]] = Numbat$new(glue('~/results/{sample}'))
    spots[[sample]] = fread(glue('~/{sample}/outs/spatial/tissue_positions.csv'))
}

options(repr.plot.width = 8.5, repr.plot.height = 8, repr.plot.res = 300)

lapply(
    c('OVCA1', 'OVCA3', 'GIST1', 'BRCA1'),
    function(sample) {
        
        spots[[sample]] %>%
            left_join(
                nb[[sample]]$clone_post, 
                by = c('barcode' = 'cell')
            ) %>%
            filter(in_tissue == 1) %>%
            ggplot(
                aes(x = array_row, y = array_col)
            ) +
            geom_point(aes(color = p_cnv), size = 1, alpha = 0.8, pch = 16) +
            scale_color_gradient2(
                low = 'darkgreen', high = 'red3', mid = 'yellow', 
                midpoint = 0.5, limits = c(0,1), oob = scales::oob_squish
            ) +
            theme_bw() +
            scale_x_continuous(expand = expansion(add = 1), limits = c(0,77)) +
            scale_y_continuous(expand = expansion(add = 1), limits = c(0,127)) +
            ggtitle(sample)
    }
) %>% 
wrap_plots(guides = 'collect')

Figure 1. Mutant versus normal probabilities. Red represents mutant (aneuploid), green represents normal, and yellow represents mixture/uncertain.

Overlaying with the tissue images, we can see that the tumor/normal boundaries coincide with visible tissue architecture.

options(repr.plot.width = 8.5, repr.plot.height = 8, repr.plot.res = 300)

lapply(
    c('OVCA1', 'OVCA3', 'GIST1', 'BRCA1'),
    function(sample) {
        
        img <- jpeg::readJPEG(glue("~/visium/nyu/images/{sample}_cropped.jpg"))

        spots[[sample]] %>%
            left_join(
                nb[[sample]]$clone_post, 
                by = c('barcode' = 'cell')
            ) %>%
            filter(in_tissue == 1) %>%
            ggplot(
                aes(x = array_row, y = array_col)
            ) +
            background_image(img) +
            geom_point(
                aes(color = p_cnv), size = 0.3,
                stroke = 0.3, alpha = 0.8, pch = 21
            ) +
            scale_color_gradient2(
                low = 'darkgreen', high = 'red3', mid = 'yellow', 
                midpoint = 0.5, limits = c(0,1), oob = scales::oob_squish
            ) +
            theme_bw() +
            scale_x_continuous(expand = expansion(add = 1), limits = c(0,77)) +
            scale_y_continuous(expand = expansion(add = 1), limits = c(0,127)) +
            ggtitle(sample)
    }
) %>% 
wrap_plots(guides = 'collect')

Figure 2. Mutant versus normal probabilities overlayed with tissue images. Red represents mutant (aneuploid), green represents normal, and yellow represents mixture/uncertain.

Per-event CNV probabilities

In addition to overall aneuploidy probability, Numbat also outputs a posterior probability per CNV event for each spot (joint_post$p_cnv). This is especially useful for resolving tumor subclonal structures. Let’s take BRCA0 as an example and plot the per-event CNV probabilities:

options(repr.plot.width = 11, repr.plot.height = 8, repr.plot.res = 300)

sample = 'BRCA0'

img <- jpeg::readJPEG(glue("~/visium/nyu/images/{sample}_cropped.jpg"))

spots[[sample]] %>%
    left_join(
        nb[[sample]]$joint_post %>%
            filter(LLR > 35 & avg_entropy < 0.6) %>%
            group_by(seg) %>%
            mutate(p_total = sum(p_cnv)) %>%
            ungroup() %>%
            arrange(-p_total) %>%
            mutate(seg = paste0(seg, '(', cnv_state, ')')) %>%
            mutate(seg = factor(seg, unique(seg))), 
        by = c('barcode' = 'cell')
    ) %>%
    filter(in_tissue == 1) %>%
    ggplot(
        aes(x = array_row, y = array_col)
    ) +
    geom_point(aes(color = p_cnv), size = 0.1, stroke = 0.5, alpha = 1) +
    scale_color_gradient2(
        low = 'blue', high = 'red3', mid = 'white',
        midpoint = 0.5, limits = c(0,1), oob = scales::oob_squish
    ) +   
    theme_bw() +
    facet_wrap(~seg) +
    scale_x_continuous(expand = expansion(add = 1), limits = c(0,77)) +
    scale_y_continuous(expand = expansion(add = 1), limits = c(0,127))

Figure 3. Probabilities of specific CNV events. Red represents high probability of event presence, blue represents low probability of event presence, and white represents mixture/uncertain.

We see that there is substantial heterogeneity among CNV events in their spatial distribution. In particular, deletions on chromosomes 9, 14, 16, 17, 19 and 22 seem to mark specific patches of the tissue section.

Tumor subclones

Finally, we can visualize the tumor subclonal structure inferred by Numbat in spatial coordinates. First, let’s look at the usual phylo-heatmap which provides a summary of the clonal structure.

options(repr.plot.width = 8, repr.plot.height = 4, repr.plot.res = 200)

pal = c(`1` = "gray", `2` = "#9E0142", `3` = "#F67D4A",
        `4` = "#F2EA91", `5` = "#77C8A4", `6` = "#5E4FA2")

nb[['BRCA0']]$plot_phylo_heatmap(pal_clone = pal)

Figure 4. Tumor phylogeny and CNV heatmap of BRCA0. The stacked color bars on the left reflect clonal assignment probabilities.

From the phylo-heatmap, we can see that clone 5 in particular harbors significantly more CNV events than the rest of the tumor cells. Let’s visualize the spatial distribution of subclones (clone_post$clone_opt) and overlay with tissue image.

options(repr.plot.width = 5, repr.plot.height = 4.2, repr.plot.res = 250)

sample = 'BRCA0'

img <- jpeg::readJPEG(glue("~/visium/nyu/images/{sample}_cropped.jpg"))

spots[[sample]] %>%
    left_join(
        nb[[sample]]$clone_post, 
        by = c('barcode' = 'cell')
    ) %>%
    filter(in_tissue == 1) %>%
    mutate(clone_opt = factor(clone_opt)) %>%
    ggplot(
        aes(x = array_row, y = array_col)
    ) +
    background_image(img) +
    geom_point(
        aes(color = clone_opt), size = 0.3, stroke = 0.3, alpha = 0.8, pch = 21
    ) +
    scale_color_manual(values = pal, limits = force) +
    guides(
        color = guide_legend(title = "Clone", override.aes = aes(size = 3, pch = 16))
    ) +
    theme_bw() +
    scale_x_continuous(expand = expansion(add = 1), limits = c(0,77)) +
    scale_y_continuous(expand = expansion(add = 1), limits = c(0,127)) +
    ggtitle(sample)

Figure 5. Clonal assignments in BRCA0. The maximum a posteriori clonal assignments of individual spots are shown.

Interestingly, if you zoom in close enough, you can see that the distribution of clone 5 corresponds to visible foci that are clearly distinct from the surrounding tumor cells. Taken together, these observations suggest that clone 5 represents a further evolved tumor subpopulation that is actively invading the diseased tissue.

Conclusion

Like for scRNA-seq, allele-specific CNV analysis can be applied to spatial transcriptomics to interrogate genetic heterogeneity in a tumor. We have shown that Numbat can be used to differentiate (aneuploid) tumor and stromal compartments, evaluate the presence/absence of specific CNV events in spatial locations, and elucidate the spatial distributions of tumor subclones. Because Visium data is even sparser than scRNA-seq, the added information provided by phased allele counts is especially valuable for CNV calling. The key assumption we have to keep in mind is that cells contained in a spot are assumed to be largely genetically homogenous. However, this assumption is not that limiting in practice because cell mixtures are reflected as uncertainty in the posterior probability of CNV presence as well as clonal assignments. Newer generations of ST technologies that provide single-cell resolution can help overcome this limitation in the future.

References

[1] Barkley, D., Moncada, R., Pour, M. et al. Cancer cell states recur across tumor types and form specific interactions with the tumor microenvironment. Nat Genet 54, 1192–1201 (2022). https://doi.org/10.1038/s41588-022-01141-9

[2] Wei, R., He, S., Bai, S. et al. Spatial charting of single-cell transcriptomes in tissues. Nat Biotechnol 40, 1190–1199 (2022). https://www.nature.com/articles/s41587-022-01233-1

[3] Elyanow R, Zeira R, Land M, Raphael BJ. STARCH: copy number and clone inference from spatial transcriptomics data. Phys Biol. 2021 Mar 9;18(3):035001. doi: 10.1088/1478-3975/abbe99. PMID: 33022659.