Module Documentation

STREAM.ElasticPrincipalGraph


LSID
urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00399
Author(s)
Huidong Chen, Massachussets General Hospital, wrapped as a module by Ted Liefeld, Mesirov Lab, UCSD School of Medicine.
Contact(s)

Algorithm and scientific questions: <Huidong.Chen  at mgh dot harvard dot edu>

Module wrapping issues:  Ted Liefeld  < jliefeld at cloud dot ucsd dot edu>


Introduction

STREAM (Single-cell Trajectories Reconstruction, Exploration And Mapping) is an interactive pipeline capable of disentangling and visualizing complex branching trajectories from both single-cell transcriptomic and epigenomic data. Within GenePattern STREAM is implemented as a collection of modules that cover the entire STREAM processing pipeline to allow individual steps to be performed interactively for data exploration.

STREAM.ElasticPrincipalGraph is uses elastic principal graph learning to calculate a pseudotime trajectory.

Elastic principal graphs are structured data approximators, consisting of vertices connected by edges. The vertices are embedded into the space of the data, minimizing the mean squared distance (MSD) to the data points, similarly to k-means. Unlike unstructured k-means, the edges connecting the vertices are used to define an elastic energy term. The elastic energy term and MSD are used to create penalties for edge stretching and bending of branches.

Algorithm

The STREAM.ElasticPrincipalGraph module uses the R-language ElPiGraph implementation of Elastic Principal Graphs. To find the optimal graph structure, ElPiGraph uses a topological grammar (or, graph grammar) approach. ElPiGraph is a completely redesigned algorithm for the previously introduced elastic principal graph optimization based on the use of elastic matrix Laplacian, trimmed mean square error, explicit control of topological complexity and scalability to millions of points on an ordinary laptop.

References

H Chen, L Albergante, JY Hsu, CA Lareau, GL Bosco, J Guan, S Zhou, AN Gorban, DE Bauer, MJ Aryee, DM Langenau, A Zinovyev, JD Buenrostro, GC Yuan, L Pinello Single-cell trajectories reconstruction, exploration and mapping of omics data with STREAM. Nature Communications, volume 10, Article number: 1903 (2019)

Nestorowa, S. et al. A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. Blood 128, e20-31 (2016).

Pinello Lab   STREAM GitHub Repository

ElPiGraph GitHub Repository

.

ADD GPNB NOTEBOOK HERE WHEN READY

Input Files

  1. data file *
    A STREAM pkl file containing an annotated AnnData matrix of gene expression data.

Output Files

  1. <output filename>_stream_result.pkl
    Output file in STREAM AnnData extended pickle (.pkl) file format suitable for passing to the next step of the STREAM analysis.
  2. <output filename>_branches.png Plot of showing the trajectory branches.
  3. <output filename>_branches_with_cells.png Plot of showing the trajectory branches with the cells positioned on them.

Example Data

Example data for the complete STREAM workflow can be downloaded from dropbox: Stream Example Data
Ref: Nestorowa, S. et al. A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. Blood 128, e20-31 (2016).

Example data for this module (this step in the workflow) is available as seeded_stream_result.pkl

Requirements

GenePattern 3.9.11 or later (dockerized).

Parameters

Inputs and Outputs

Name Description
data file* A STREAM pkl file containing an annotated AnnData matrix of gene expression data/td>
output filename* The output filename prefix.

Elastic Principal Graph

Name Description
epg num nodes* Number of nodes for elastic principal graph.
incremental number of nodes* Incremental number of nodes for elastic principal graph when epg_n_nodes is not big enough.
epg trimming radius* Maximal distance from a node to the points it controls in the embedding.
epg alpha* Alpha parameter of the penalized elastic energy.
epg beta* Beta parameter of the penalized elastic energy.
epg lambda* Lambda parameter used to compute the elastic energy.
epg mu* Mu parameter used to compute the elastic energy.
epg final energy* Indicate the final elastic energy associated with the configuration.

Plotting

Parameters controlling the output figures.
Name Description
num componenets The number of components to be plotted
component x Component used for x-axis in plots
component y Component used for y axis in plots
figure height Figure height as used in matplotlib graphs. Default=8.
figure width Figure width as used in matplotlib plots. Default=8
figure legend num columns* The number of columns that the legend has.

* - required