Algorithm and scientific questions: <Huidong.Chen at mgh dot harvard dot edu>
Module wrapping issues: Ted Liefeld < jliefeld at cloud dot ucsd dot edu>
STREAM (Single-cell Trajectories Reconstruction, Exploration And Mapping) is an interactive pipeline capable of disentangling and visualizing complex branching trajectories from both single-cell transcriptomic and epigenomic data. Within GenePattern STREAM is implemented as a collection of modules that cover the entire STREAM processing pipeline to allow individual steps to be performed interactively for data exploration.
STREAM.SeedEPGStructure is use to seed the initial elastic principal graph prior to starting the trajectory learning process.
Elastic principal graphs are structured data approximators, consisting of vertices connected by edges. The vertices are embedded into the space of the data, minimizing the mean squared distance (MSD) to the data points, similarly to k-means. Unlike unstructured k-means, the edges connecting the vertices are used to define an elastic energy term. The elastic energy term and MSD are used to create penalties for edge stretching and bending of branches.
The principal graph inference is based on a greedy optimization procedure that may lead to local minima, therefore in STREAM we use the STREAM.SeedEPGStructure module as an initialization procedure that improves the quality of the inferred solutions and speeds up convergence. First, cells are clustered in the low-dimensional space (by default, k-means is used. Alternatively another two clustering methods including affinity propagation(ap) and spectral clustering(sc) are also available). Based on the centroids obtained, a minimum spanning tree (MST) is constructed using the Kruskal’s algorithm. The obtained tree is then used as initial tree structure for the ElPiGraph procedure.
H Chen, L Albergante, JY Hsu, CA Lareau, GL Bosco, J Guan, S Zhou, AN Gorban, DE Bauer, MJ Aryee, DM Langenau, A Zinovyev, JD Buenrostro, GC Yuan, L Pinello Single-cell trajectories reconstruction, exploration and mapping of omics data with STREAM. Nature Communications, volume 10, Article number: 1903 (2019)
Nestorowa, S. et al. A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. Blood 128, e20-31 (2016).
Pinello Lab STREAM Github Repository
Example data for the STREAM workflow can be downloaded from dropbox: Stream Example Data
Ref: Nestorowa, S. et al. A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. Blood 128, e20-31 (2016).
An input file suitable for this step is available at dimred_stream_result.pkl
GenePattern 3.9.11 or later (dockerized).
Inputs and Outputs
|data file*||A STREAM pkl file containing an annotated AnnData matrix of gene expression data/td>|
|output filename*||The output filename prefix.|
|percent neighbor cells*||Neighbor percentage. The percentage of points used as neighbors for spectral clustering.|
|num clusters*||Number of clusters (only valid once 'clustering' is specificed as 'Spectral Clustering' or 'K-Means').|
|damping*||Damping factor (between 0.5 and 1) for affinity propagation.|
|preference percentile*||Preference percentile (between 0 and 100). The percentile of the input similarities for affinity propagation.|
|max clusters*||Number of clusters (only valid once 'clustering' is specificed as 'Spectral Clustering' or 'K-Means').|
|clustering*||Clustering method used to infer the initial nodes. Choose from affinity propagation, K-Means clustering, Spectral Clustering|
PlottingParameters controlling the output figures.
|num components*||The number of components to be plotted.|
|component x*||Component used for the x axis|
|component y*||Component used for the y axis|
|figure height||Figure height as used in matplotlib graphs. Default=8.|
|figure width||Figure width as used in matplotlib plots. Default=8|
|figure legend num columns*||The number of columns that the legend has.|
* - required