Module Documentation

STREAM.EPGAdjustFinalGraph


LSID
urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00399
Author(s)
Huidong Chen, Massachussets General Hospital, wrapped as a module by Ted Liefeld, Mesirov Lab, UCSD School of Medicine.
Contact(s)

Algorithm and scientific questions: <Huidong.Chen  at mgh dot harvard dot edu>

Module wrapping issues:  Ted Liefeld  < jliefeld at cloud dot ucsd dot edu>


Introduction

STREAM (Single-cell Trajectories Reconstruction, Exploration And Mapping) is an interactive pipeline capable of disentangling and visualizing complex branching trajectories from both single-cell transcriptomic and epigenomic data. Within GenePattern STREAM is implemented as a collection of modules that cover the entire STREAM processing pipeline to allow individual steps to be performed interactively for data exploration.

STREAM.EPGAdjustFinalGraph facilitates some final adjustments to the graph generated by STREAM.ElasticPrincipalGraph such as optimizing the structure, pruning the final structure or extending leaf nodes.

Huidong to add detail explaining this part of the STREAM pipeline

 

Algorithm

Huidong to cover details of this portion of the analysis

References

H Chen, L Albergante, JY Hsu, CA Lareau, GL Bosco, J Guan, S Zhou, AN Gorban, DE Bauer, MJ Aryee, DM Langenau, A Zinovyev, JD Buenrostro, GC Yuan, L Pinello Single-cell trajectories reconstruction, exploration and mapping of omics data with STREAM. Nature Communications, volume 10, Article number: 1903 (2019)

Nestorowa, S. et al. A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. Blood 128, e20-31 (2016).

Pinello Lab   STREAM Github Repository

ADD GPNB NOTEBOOK HERE WHEN READY

Input Files

  1. data file *
    A STREAM pkl file containing an annotated AnnData matrix of gene expression data.

Output Files

  1. <output filename>_stream_result.pkl
    Output file in STREAM AnnData extended pickle (.pkl) file format suitable for passing to the next step of the STREAM analysis.
  2. <output filename>_branches.png Plot of showing the trajectory branches.
  3. <output filename>_branches_with_cells.png Plot of showing the trajectory branches with the cells positioned on them.

Example Data

Example data for the STREAM workflow can be downloaded from dropbox: Stream Example Data
Ref: Nestorowa, S. et al. A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. Blood 128, e20-31 (2016).

Example data for this step in the workflow is available as stream_epg_result.pkl.

Requirements

GenePattern 3.9.11 or later (dockerized).

Parameters

Inputs and Outputs

Name Description
data file* A STREAM pkl file containing an annotated AnnData matrix of gene expression data/td>
output filename* The output filename prefix.

Elastic Principal Graph

Name Description
epg trimming radius* Maximal distance from a node to the points it controls in the embedding.
epg alpha* Alpha parameter of the penalized elastic energy.
epg beta* Beta parameter of the penalized elastic energy.
epg lambda* Lambda parameter used to compute the elastic energy.
epg mu* Mu parameter used to compute the elastic energy.
epg final energy* Indicate the final elastic energy associated with the configuration.

Optimize Structure

Name Description
epg max steps* The maximum number of iteration steps.
incremental number of nodes* Incremental number of nodes for elastic principal graph.

Prune Graph

Name Description
epg collapse mode* The mode used to prune the graph.Choose from {{'PointNumber','PointNumber_Extrema','PointNumber_Leaves','EdgesNumber','EdgesLength'}}. 'PointNumber': branches with less than epg_collapse_par points (points projected on the extreme points are not considered) are removed; 'PointNumber_Extrema', branches with less than epg_collapse_par (points projected on the extreme points are not considered) are removed; 'PointNumber_Leaves', branches with less than epg_collapse_par points (points projected on non-leaf extreme points are not considered) are removed.
epg collapse parameter The paramter used to control different modes.

Extend Leaf Nodes

Name Description
epg extension mode* The mode used to extend the leaves. Choose from {{'QuantDists','QuantCentroid','WeigthedCentroid'}}; 'QuantCentroid':for each leaf node, the extreme points are ordered by their distance from the node and the centroid of the points further than epg_ext_par is returned; 'WeigthedCentroid':for each leaf node, a weight is computed for each points by raising the distance to the epg_ext_par power. Larger epg_ext_par results in a bigger influence of points further than the node; 'QuantDists':for each leaf node, the extreme points are ordered by their distance from the node and the 100*epg_ext_par th percentile of the points farther than epg_ext_par is returned.
epg extension parameter The paramter used to control different modes.

Plotting

Parameters controlling the output figures.
Name Description
num componenets* The number of components to be plotted
component x* Component used for x-axis in plots
component y* Component used for y axis in plots
figure height Figure height as used in matplotlib graphs. Default=8.
figure width Figure width as used in matplotlib plots. Default=8
figure legend num columns* The number of columns that the legend has.

* - required