Skip to contents

Fast maximum-likelihood phylogeny inference from noisy single-cell data using the ‘ScisTree’ algorithm (Wu, Bioinformatics 2019). ‘scistreer’ provides an ‘R’ interface and improves speed via ‘Rcpp’ and ‘RcppParallel’, making the method applicable to massive single-cell datasets (>10,000 cells).

Installation

To install the stable CRAN version,

install.packages('scistreer', dependencies = TRUE)

To get the most recent updates, you can install the github version via devtools:

devtools::install_github('https://github.com/kharchenkolab/scistreer')

Usage

Within R, you only need to supply a genotype probability matrix (cell x mutation), where each entry is the probability that the cell harbors the mutation. For example,

treeML = run_scistree(P_example, ncores = 8, init = 'UPGMA', verbose = FALSE)

The output maximum likelihood tree is an ape::phylo object. You can visualize the output and the probability matrix as follows:

plot_phylo_heatmap(treeML, P_example)

Benchmark

scistreer is about 10x faster than the original implementation on a single thread. The runtime of scistreer can be further reduced by shared-memory multi-threading via RcppParallel. image

Citations

For the original publication, please refer to:

Yufeng Wu, Accurate and efficient cell lineage tree inference from noisy single cell data: the maximum likelihood perfect phylogeny approach, Bioinformatics, Volume 36, Issue 3, 1 February 2020, Pages 742–750, https://doi.org/10.1093/bioinformatics/btz676

If you would like to cite this package, please use:

Teng Gao, Evan Biederstedt, Peter Kharchenko, Yufeng Wu (2022). ScisTreeR: Speeding up the ScisTree Algorithm via RcppParallel. R package version 1.0.0. https://github.com/kharchenkolab/scistreer