List of output files
Numbat generates a number of files in the output folder. The file
names are post-fixed with the ith iteration of phylogeny
optimization. Here is a detailed list:
Analysis results
-
bulk_subtrees_{i}.tsv.gz: Subtree pseudobulk profiles based on current cell lineage tree -
segs_consensus_{i}.tsv.gz: Consensus segments from subtree pseudobulk HMMs -
bulk_clones_{i}.tsv.gz: Clone-level pseudobulk profiles based on current cell lineage tree -
exp_post_{i}.tsv: Expression-based posterior probabilities of CNV states for each segment in each cell. -
allele_post_{i}.tsv: Allele-based posterior probabilities of CNV states for each segment in each cell. -
joint_post_{i}.tsv: Joint posterior probabilities of CNV states for each segment in each cell. -
clone_post_{i}.tsv: Single-cell clone assignment and tumor versus normal classification posteriors -
bulk_clones_final.tsv.gz: Clone-level pseudobulk profiles based on final cell lineage tree -
bulk_subtrees_retest_{i}.tsv.gz: Subtree pseudobulk profiles after retesting CNV states -
gexp_roll_wide.tsv.gz: window-smoothed normalized expression profiles of single cells -
segs_loh.tsv: Clonal LoH segments; written ifcall_clonal_lohis enabled
Plots
-
exp_roll_clust.png: visualization of single-cell smoothed gene expression profiles -
bulk_subtrees_{i}.png: visualization of subtree pseudobulk CNV profiles -
bulk_clones_{i}.png: visualization of clone pseudobulk CNV profiles -
bulk_clones_final.png: visualization of final clone pseudobulk CNV profiles -
tree_list_{i}.rds: list of candidate phylogeneies in the maximum likelihood tree search -
panel_{i}.png: integrated visualization of single-cell phylogeny and CNV landscape
Lineage trees
-
hc.rds: initial hierarchical clustering result based on smoothed expression -
clones_{i}.rds: list of candidate clones used to generatebulk_clones_{i}.tsv.gz -
subtrees_{i}.rds: list of candidate subtrees used to generatebulk_subtrees_{i}.tsv.gz -
tree_NJ_{i}.rds: neighbor joining tree -
mut_graph_{i}.rds: mutation graph derived from the current cell lineage tree -
tree_ML_{i}.rds: maximum likelihood tree (in ape::phylo format) -
tree_final_{i}.rds: final cell lineage tree with mutation and clone annotation (in tbl_graph format)
Single-cell posteriors
cell: character; Cell barcode
CHROM: character; Chromosome
seg: character; Segment ID
cnv_state: character; CNV state estimated from
pseudobulk HMM
n_snp: numeric; Number of SNPs in segment
seg_start: numeric; Segment start position
seg_end: numeric; Segment end position
n_genes: numeric; Number of genes in segment
n_snps: numeric; Number of SNPs in segment
prior_loh: numeric; Prior probability of CNLoH
prior_amp: numeric; Prior probability of
amplification
prior_del: numeric; Prior probability of deletion
prior_bamp: numeric; Prior probability of biallelic
amplification
prior_bdel: numeric; Prior probability of biallelic
deletion
l11_x: numeric; Log-likelihood of CNV state 1:1
(neutral) given expression data
l20_x: numeric; Log-likelihood of CNV state 2:0 (CNLoH)
given expression data
l10_x: numeric; Log-likelihood of CNV state 1:0
(deletion) given expression data
l21_x: numeric; Log-likelihood of CNV state 2:1
(amplification) given expression data
l31_x: numeric; Log-likelihood of CNV state 3:1
(amplification) given expression data
l22_x: numeric; Log-likelihood of CNV state 2:2
(biallelic amplification) given expression data
l00_x: numeric; Log-likelihood of CNV state 0:0
(biallelic deletion) given expression data
Z_cnv_x: numeric; Total log(likelihood * prior) of CNV
states given expression data
Z_n_x: numeric; Total log(likelihood * prior) of neutral
state given expression data
logBF_x: numeric; Log Bayes factor of CNV state
vs. neutral state given expression data
l11_y: numeric; Log-likelihood of CNV state 1:1
(neutral) given allele data
l20_y: numeric; Log-likelihood of CNV state 2:0 (CNLoH)
given allele data
l10_y: numeric; Log-likelihood of CNV state 1:0
(deletion) given allele data
l21_y: numeric; Log-likelihood of CNV state 2:1 (gain)
given allele data
l31_y: numeric; Log-likelihood of CNV state 3:1
(amplification) given allele data
l22_y: numeric; Log-likelihood of CNV state 2:2
(biallelic amplification) given allele data
l00_y: numeric; Log-likelihood of CNV state 0:0
(biallelic deletion) given allele data
Z_cnv_y: numeric; Total log(likelihood * prior) of CNV
states given allele data
Z_n_y: numeric; Total log(likelihood * prior) of neutral
state given allele data
logBF_y: numeric; Log Bayes factor of CNV state
vs. neutral state given allele data
LLR: numeric; Log-likelihood ratio of CNV state
vs. neutral state
LLR_x: numeric; Log-likelihood ratio of CNV state
vs. neutral state given expression data
LLR_y: numeric; Log-likelihood ratio of CNV state
vs. neutral state given allele data
l11: numeric; Joint log-likelihood of CNV state 1:1
(neutral)
l20: numeric; Joint log-likelihood of CNV state 2:0
(CNLoH)
l10: numeric; Joint log-likelihood of CNV state 1:0
(deletion)
l21: numeric; Joint log-likelihood of CNV state 2:1
(amplification)
l31: numeric; Joint log-likelihood of CNV state 3:1
(amplification)
l22: numeric; Joint log-likelihood of CNV state 2:2
(biallelic amplification)
l00: numeric; Joint log-likelihood of CNV state 0:0
(biallelic deletion)
Z_amp: numeric; Total log(likelihood * prior) of
amplification state
Z_loh: numeric; Total log(likelihood * prior) of CNLoH
state
Z_del: numeric; Total log(likelihood * prior) of
deletion state
Z_bamp: numeric; Total log(likelihood * prior) of
biallelic amplification state
Z_bdel: numeric; Total log(likelihood * prior) of
biallelic deletion state
Z_n: numeric; Total log(likelihood * prior) of neutral
state
Z: numeric; Total log(likelihood * prior) of all
states
Z_cnv: numeric; Total log(likelihood * prior) of CNV
states
p_amp: numeric; Joint posterior probability of
amplification states (2:1, 3:1)
p_neu: numeric; Joint posterior probability of neutral
state
p_del: numeric; Joint posterior probability of deletion
state
p_loh: numeric; Joint posterior probability of CNLoH
state
p_bamp: numeric; Joint posterior probability of
biallelic amplification state
p_bdel: numeric; Joint posterior probability of
biallelic deletion state
logBF: numeric; Joint log Bayes factor of CNV state
vs. neutral state
p_cnv: numeric; Joint posterior probability of CNV
state
p_n: numeric; Joint posterior probability of neutral
state
p_cnv_x: numeric; Joint posterior probability of CNV
state given expression data
p_cnv_y: numeric; Joint posterior probability of CNV
state given allele data
cnv_state_mle: character; Maximum likelihood CNV
state
cnv_state_map: character; Maximum a posteriori CNV
state
seg_label: character; Segment label
avg_entropy: numeric; Average entropy of CNV posterior
in single cells
phi_mle: numeric; Maximum likelihood of total copy
number ratio relative to diploid (phi)
mu: numeric; Mean of expression count distribution in
the cell (Poisson log-Normal)
sigma: numeric; Standard deviation of expression count
distribution in the cell (Poisson log-Normal)
ref: character; Best-matching single-cell expression
reference
major: integer; Major allele count; total SNP pileup
counts deriving from the major haplotype in the CNV region for the given
cell
minor: integer; Minor allele count; total SNP pileup
counts deriving from the minor haplotype in the CNV region for the given
cell
total: integer; Total allele count
MAF: numeric; Major allele frequency
Pseudobulk profiles
snp_id: character; SNP ID
CHROM: character; Chromosome
POS: integer; Genomic position
cM: numeric; Genetic distance in cM
REF: character; Reference allele
ALT: character; Alternate allele
GT: character; Phased genotype
gene: character; Gene symbol
AD: integer; Allelic depth
DP: integer; Total depth
AR: numeric; Allelic fraction
snp_index: integer; SNP index
pBAF: numeric; Phased BAF
pAD: numeric; Phased allelic depth
inter_snp_cm: numeric; Genetic distance in cM between
adjacent SNPs
p_s: numeric; Probability of phase switch based on
inter-SNP distance
Y_obs: integer; Observed gene expression counts (the X/Y
notation is switched here..)
lambda_obs: numeric; Observed gene expression
magnitude
lambda_ref: numeric; Reference gene expression
magnitude
d_obs: numeric; Total gene expression count in the
cell
gene_start: integer; Gene start position
gene_end: integer; Gene end position
gene_length: integer; Gene length
gene_index: integer; Gene index
logFC: numeric; Log2 fold change of gene expression
lnFC: numeric; Natural log fold change of gene
expression
mse: numeric; Mean squared error of logFC
snp_rate: numeric; SNP rate
loh: logical; True if segment has a clonal LoH
(deletion)
n_cells: integer; Number of cells in the pseudobulk
members: character; Cell groups included in the
pseudobulk
sample: character/integer; Sample ID
state: character; CNV state
boundary: logical; True if the marker is at CNV
boundary
seg_start_index: integer; Segment start marker index
seg_end_index: integer; Segment end marker index
seg_start: integer; Segment start position
seg_end: integer; Segment end position
seg_length: integer; Segment length in bp
seg: character; Segment ID
seg_cons: character; Consensus segment ID
diploid: logical; True if the segment is diploid
mu: numeric; Mean of expression count distribution in
the pseudobulk (Poisson log-Normal)
sig: numeric; Standard deviation of expression count
distribution in the pseudobulk (Poisson log-Normal)
cnv_state: character; CNV state
n_genes: integer; Number of genes in the segment
n_snps: integer; Number of SNPs in the segment
theta_hat: numeric; Crude estimate haplotype frequency
based on MAF
theta_mle: numeric; Maximum likelihood estimate of
haplotype frequency (theta)
theta_sigma: numeric; Standard deviation of theta
MLE
L_y_n: numeric; Neutral log-likelihood of allele
data
L_y_d: numeric; Deletion log-likelihood of allele
data
L_y_a: numeric; Amplification log-likelihood of allele
data
phi_mle: numeric; Maximum likelihood of total copy
number ratio relative to diploid (phi)
phi_sigma: numeric; Standard deviation of phi MLE
L_x_n: numeric; Neutral log-likelihood of expression
data
L_x_d: numeric; Deletion log-likelihood of expression
data
L_x_a: numeric; Amplification log-likelihood of
expression data
Z_cnv: numeric; Total log(likelihood * prior) of CNV
states
Z_n: numeric; Total log(likelihood * prior) of neutral
state
Z: numeric; Total log(likelihood * prior) of all
states
logBF: numeric; Joint log Bayes factor of CNV state
vs. neutral state
p_neu: numeric; Joint posterior probability of neutral
state
p_loh: numeric; Joint posterior probability of CNLoH
state
p_amp: numeric; Joint posterior probability of
amplification states (2:1, 3:1)
p_del: numeric; Joint posterior probability of deletion
state
p_bamp: numeric; Joint posterior probability of
biallelic amplification state
p_bdel: numeric; Joint posterior probability of
biallelic deletion state
LLR_x: numeric; Log-likelihood ratio of expression
data
LLR_y: numeric; Log-likelihood ratio of allele data
LLR: numeric; Log-likelihood ratio of all data
cnv_state_post: character; Maximum a posteriori (retest)
CNV state
state_post: character; Maximum a posteriori CNV allelic
state
p_up: numeric; HMM posterior probability of variant
allele belonging to the major haplotype
haplo_post: numeric; Maximum a posteriori haplotype
state assignment
haplo_naive: numeric; Naive haplotype state assignment
(based on BAF)
major_count: integer; Major allele count; total SNP
counts deriving from the major haplotype in the CNV region in the
pseudobulk
minor_count: integer; Minor allele count; total SNP
counts deriving from the minor haplotype in the CNV region in the
pseudobulk
theta_hat_roll: numeric; Crude estimate haplotype
frequency based on MAF (rolling window)
phi_mle_roll: numeric; Maximum likelihood of total copy
number ratio relative to diploid (phi, rolling window)
nu: numeric; Phase-switch rate used in the HMM
gamma: numeric; Allele inverse-overdispersion used in
the HMM
Consensus segments
consensus segments from subtree pseudobulk HMMs
sample: character/integer; Sample ID
CHROM: character; Chromosome
seg: character; Segment ID
cnv_state: character; CNV state
cnv_state_post: character; Maximum a posteriori (retest)
CNV state
seg_start: integer; Segment start position
seg_end: integer; Segment end position
seg_start_index: integer; Segment start marker index
seg_end_index: integer; Segment end marker index
theta_mle: numeric; Maximum likelihood estimate of
haplotype frequency (theta)
theta_sigma: numeric; Standard deviation of theta
MLE
phi_mle: numeric; Maximum likelihood of total copy
number ratio relative to diploid (phi)
phi_sigma: numeric; Standard deviation of phi MLE
p_loh: numeric; Joint posterior probability of CNLoH
state
p_del: numeric; Joint posterior probability of deletion
state
p_amp: numeric; Joint posterior probability of
amplification states (2:1, 3:1)
p_bamp: numeric; Joint posterior probability of
biallelic amplification state
p_bdel: numeric; Joint posterior probability of
biallelic deletion state
LLR: numeric; Log-likelihood ratio of all data
LLR_y: numeric; Log-likelihood ratio of allele data
LLR_x: numeric; Log-likelihood ratio of expression
data
n_genes: integer; Number of genes in the segment
n_snps: integer; Number of SNPs in the segment
component: integer; Component ID
LLR_sample: numeric; Log-likelihood ratio in the sample
where the CNV has the highest LLR
seg_length: integer; Segment length in bp
seg_cons: character; Consensus segment ID
n_states: integer; Number of CNV states
cnv_states: character; CNV states
Clone assignments
cell: character; Cell ID
clone_opt: integer; Maximum a posteriori clone
assignment
GT_opt: character; Maximum a posteriori genotype
p_opt: numeric; Maximum a posteriori clone
probability
p_{k}: numeric; Posterior probability of cell beloning
to clone k
p_x_{k}: numeric; Posterior probability of cell
belonging to clone k given expression data
p_y_{k}: numeric; Posterior probability of cell
belonging to clone k given allele data
p_cnv: numeric; Posterior probability of cell belonging
to an aneuploid clone
p_cnv_x: numeric; Posterior probability of cell
belonging to an aneuploid clone given expression data
p_cnv_y: numeric; Posterior probability of cell
belonging to an aneuploid clone given allele data
compartment_opt: character; Maximum a posteriori
compartment (tumor vs normal) assignment
