Fast maximum-likelihood phylogeny inference from noisy single-cell data using the ‘ScisTree’ algorithm (Wu, Bioinformatics 2019). ‘scistreer’ provides an ‘R’ interface and improves speed via ‘Rcpp’ and ‘RcppParallel’, making the method applicable to massive single-cell datasets (>10,000 cells).
Installation
To install the stable CRAN version,
install.packages('scistreer', dependencies = TRUE)
To get the most recent updates, you can install the github version via devtools
:
devtools::install_github('https://github.com/kharchenkolab/scistreer')
Usage
Within R, you only need to supply a genotype probability matrix (cell x mutation), where each entry is the probability that the cell harbors the mutation. For example,
treeML = run_scistree(P_example, ncores = 8, init = 'UPGMA', verbose = FALSE)
The output maximum likelihood tree is an ape::phylo
object. You can visualize the output and the probability matrix as follows:
plot_phylo_heatmap(treeML, P_example)
Benchmark
scistreer
is about 10x faster than the original implementation on a single thread. The runtime of scistreer
can be further reduced by shared-memory multi-threading via RcppParallel
.
Citations
For the original publication, please refer to:
Yufeng Wu, Accurate and efficient cell lineage tree inference from noisy single cell data: the maximum likelihood perfect phylogeny approach, Bioinformatics, Volume 36, Issue 3, 1 February 2020, Pages 742–750, https://doi.org/10.1093/bioinformatics/btz676
If you would like to cite this package, please use:
Teng Gao, Evan Biederstedt, Peter Kharchenko, Yufeng Wu (2022). ScisTreeR: Speeding up the ScisTree Algorithm via RcppParallel. R package version 1.0.0. https://github.com/kharchenkolab/scistreer