Module Documentation

STREAM.Plot2DVisualization


LSID
urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00406
Author(s)
Huidong Chen, Massachussets General Hospital, wrapped as a module by Ted Liefeld, Mesirov Lab, UCSD School of Medicine.
Contact(s)

Algorithm and scientific questions: <Huidong.Chen  at mgh dot harvard dot edu>

Module wrapping issues:  Ted Liefeld  < jliefeld at cloud dot ucsd dot edu>


Introduction

STREAM (Single-cell Trajectories Reconstruction, Exploration And Mapping) is an interactive pipeline capable of disentangling and visualizing complex branching trajectories from both single-cell transcriptomic and epigenomic data. Within GenePattern STREAM is implemented as a collection of modules that cover the entire STREAM processing pipeline to allow individual steps to be performed interactively for data exploration.

STREAM.Plolt2DVisualization is used check if there is clear meaningful trajectory pattern to the data. If there is, we will continue the downstream analysis placing the cells onto the trajectories. If not, we would go back to previous steps to modify the parameters used to filter and prepare the data to try different settings.

Algorithm

To check the data, we use UMAP (Uniform Manifold Approximation and Projection) or tSNE (t-Distributed Stichastic Neighbor Embedding) based on the components returned from a run of the STREAM.DimensionReduction module to visualize the data in 2D plane.

UMAP is a manifold learning technique ifor dimension reduction constructed from a theoretical framework based in Riemannian geometry and algebraic topology. UMAP preserves more of the global structure than tSNE and runs more quickly.

tSNE is a technique for dimensionality reduction for the visualization of high-dimensional datasets. This technique is implemented via Barnes-Hut approximations, allowing it to be applied on large real-world datasets.

References

H Chen, L Albergante, JY Hsu, CA Lareau, GL Bosco, J Guan, S Zhou, AN Gorban, DE Bauer, MJ Aryee, DM Langenau, A Zinovyev, JD Buenrostro, GC Yuan, L Pinello Single-cell trajectories reconstruction, exploration and mapping of omics data with STREAM. Nature Communications, volume 10, Article number: 1903 (2019)

Nestorowa, S. et al. A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. Blood 128, e20-31 (2016).

Pinello Lab   STREAM Github Repository

https://umap-learn.readthedocs.io/en/latest/

Leland McInnes, John Healy, James Melville, UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

L.J.P. van der Maaten. Accelerating t-SNE using Tree-Based Algorithms. Journal of Machine Learning Research 15(Oct):3221-3245, 2014.

Input Files

  1. data file *
    A STREAM pkl file containing an annotated AnnData matrix of gene expression data.

Output Files

  1. <output filename>_stream_result.pkl
    Output file in STREAM AnnData extended pickle (.pkl) file format suitable for passing to the next step of the STREAM analysis.
  2. <output filename>_2D_plot.png TSNE or UMAP plot of the data..

Example Data

Example data for the STREAM workflow can be downloaded from dropbox: Stream Example Data
Ref: Nestorowa, S. et al. A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. Blood 128, e20-31 (2016).

An input file suitable for this step is available at dimred_stream_result.pkl

Requirements

GenePattern 3.9.11 or later (dockerized).

Parameters

General

Name Description
data file* A STREAM pkl file containing an annotated AnnData matrix of gene expression data/td>
output filename* The output filename prefix.
method Method used for visualization. Choose from; 'umap': Uniform Manifold Approximation and Projection; 'tsne': t-Distributed Stochastic Neighbor Embedding.
percent neighbor cells The percentage of neighbor cells (only valid when 'umap' is specified).
perplexity The perplexity used (only valid when tSNE is specified).
color by Specify how to color cells. 'label': the cell labels, 'branch': the branch id identifed by STREAM
use precomputed If True, the visualization coordinates from previous computation result (in pkl input file) will be used

Plotting

Parameters controlling the output figures.
Name Description
figure height Figure height as used in matplotlib graphs. Default=8.
figure width Figure width as used in matplotlib plots. Default=8
figure legend num columns The number of columns used in the figure legend, default=3.

* - required