Algorithm and scientific questions: <Huidong.Chen at mgh dot harvard dot edu>
Module wrapping issues: Ted Liefeld < jliefeld at cloud dot ucsd dot edu>
STREAM (Single-cell Trajectories Reconstruction, Exploration And Mapping) is an interactive pipeline capable of disentangling and visualizing complex branching trajectories from both single-cell transcriptomic and epigenomic data. Within GenePattern STREAM is implemented as a collection of modules that cover the entire STREAM processing pipeline to allow individual steps to be performed interactively for data exploration.
STREAM.ElasticPrincipalGraph is uses elastic principal graph learning to calculate a pseudotime trajectory.
Elastic principal graphs are structured data approximators, consisting of vertices connected by edges. The vertices are embedded into the space of the data, minimizing the mean squared distance (MSD) to the data points, similarly to k-means. Unlike unstructured k-means, the edges connecting the vertices are used to define an elastic energy term. The elastic energy term and MSD are used to create penalties for edge stretching and bending of branches.
The STREAM.ElasticPrincipalGraph module uses the R-language ElPiGraph implementation of Elastic Principal Graphs. To find the optimal graph structure, ElPiGraph uses a topological grammar (or, graph grammar) approach. ElPiGraph is a completely redesigned algorithm for the previously introduced elastic principal graph optimization based on the use of elastic matrix Laplacian, trimmed mean square error, explicit control of topological complexity and scalability to millions of points on an ordinary laptop.
H Chen, L Albergante, JY Hsu, CA Lareau, GL Bosco, J Guan, S Zhou, AN Gorban, DE Bauer, MJ Aryee, DM Langenau, A Zinovyev, JD Buenrostro, GC Yuan, L Pinello Single-cell trajectories reconstruction, exploration and mapping of omics data with STREAM. Nature Communications, volume 10, Article number: 1903 (2019)
Nestorowa, S. et al. A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. Blood 128, e20-31 (2016).
Pinello Lab STREAM GitHub Repository
ElPiGraph GitHub Repository
.ADD GPNB NOTEBOOK HERE WHEN READY
Example data for the complete STREAM workflow can be downloaded from dropbox: Stream Example Data
Ref: Nestorowa, S. et al. A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. Blood 128, e20-31 (2016).
Example data for this module (this step in the workflow) is available as seeded_stream_result.pkl
GenePattern 3.9.11 or later (dockerized).
Inputs and Outputs | |
Name | Description |
---|---|
data file* | A STREAM pkl file containing an annotated AnnData matrix of gene expression data/td> |
output filename* | The output filename prefix. |
Elastic Principal Graph | |
Name | Description |
epg num nodes* | Number of nodes for elastic principal graph. |
incremental number of nodes* | Incremental number of nodes for elastic principal graph when epg_n_nodes is not big enough. |
epg trimming radius* | Maximal distance from a node to the points it controls in the embedding. |
epg alpha* | Alpha parameter of the penalized elastic energy. |
epg beta* | Beta parameter of the penalized elastic energy. |
epg lambda* | Lambda parameter used to compute the elastic energy. |
epg mu* | Mu parameter used to compute the elastic energy. |
epg final energy* | Indicate the final elastic energy associated with the configuration. |
PlottingParameters controlling the output figures. | |
Name | Description |
num componenets | The number of components to be plotted |
component x | Component used for x-axis in plots |
component y | Component used for y axis in plots |
figure height | Figure height as used in matplotlib graphs. Default=8. |
figure width | Figure width as used in matplotlib plots. Default=8 |
figure legend num columns* | The number of columns that the legend has. |
* - required