Algorithm and scientific questions: <Huidong.Chen at mgh dot harvard dot edu>
Module wrapping issues: Ted Liefeld < jliefeld at cloud dot ucsd dot edu>
STREAM (Single-cell Trajectories Reconstruction, Exploration And Mapping) is an interactive pipeline capable of disentangling and visualizing complex branching trajectories from both single-cell transcriptomic and epigenomic data. Within GenePattern STREAM is implemented as a collection of modules that cover the entire STREAM processing pipeline to allow individual steps to be performed interactively for data exploration.
STREAM.DimensionReduction is used to reduce the dimensionality of the dataset to be used in the downstream analysis.
Each cell can be thought as a vector in a multi-dimensional vector space in which each component is the expression level of a gene. Typically, even after feature selection, each cell still has hundreds of components, making it difficult to reliably assess similarity or distances between cells, a problem often referred as the curse of dimensionality. To mitigate this problem, starting from the genes selected in the previous step we project cells to a lower dimensional space using a non-linear dimensionality reduction method called Modified Locally Linear Embedding (MLLE).
Several alternative dimension reduction methods are also supported, spectral embedding, umap, pca. By default, this module uses MLLE.
For large datasets, spectral embedding works faster than MLLE while preserving a similar compact structure to MLLE. For large datasets, lowering the percent neighbor cells parameter (0.1 by default) will speed up this step.
By default we set the number of components to keep to 3. For biological process with simple bifurcation or linear trajectory, keeping only two components would be recommended.
H Chen, L Albergante, JY Hsu, CA Lareau, GL Bosco, J Guan, S Zhou, AN Gorban, DE Bauer, MJ Aryee, DM Langenau, A Zinovyev, JD Buenrostro, GC Yuan, L Pinello Single-cell trajectories reconstruction, exploration and mapping of omics data with STREAM. Nature Communications, volume 10, Article number: 1903 (2019)
Nestorowa, S. et al. A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. Blood 128, e20-31 (2016).
Pinello Lab STREAM Github Repository
ADD GPNB NOTEBOOK HERE WHEN READY
Example data can be downloaded from dropbox: Stream Example Data
Ref: Nestorowa, S. et al. A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. Blood 128, e20-31 (2016).
Example data for the STREAM workflow can be downloaded from dropbox: Stream Example Data
Ref: Nestorowa, S. et al. A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. Blood 128, e20-31 (2016).
An input file suitable for this step is available at preprocessed_stream_result.pkl
GenePattern 3.9.11 or later (dockerized).
Inputs and Outputs | |
Name | Description |
---|---|
data file* | A STREAM pkl file containing an annotated AnnData matrix of gene expression data/td> |
output filename* | The output filename prefix. |
Dimension Reduction | |
Name | Description |
percent neighbor cells | The percentage neighbor cells (only valid when 'mlle', 'se', or 'umap' is specified). |
num components to keep | The number of components to keep in the resulting dataset. |
feature | Feature used for dimension reduction. Choose from ['var_genes','top_pcs','all']. 'var_genes': most variable genes. 'top_pcs': top principal components. 'all': all genes. |
method | Method used for dimension reduction.Choose from {{'mlle','umap','pca'}} |
PlottingParameters controlling the output figures. | |
Name | Description |
num components to plot | Number of components to be plotted. |
component x | Component used for x axis. |
component y | Component used for y axis. |
figure height | Figure height as used in matplotlib graphs. Default=8. |
figure width | Figure width as used in matplotlib plots. Default=8 |
* - required