Algorithm and scientific questions: <Huidong.Chen at mgh dot harvard dot edu>
Module wrapping issues: Ted Liefeld < jliefeld at cloud dot ucsd dot edu>
STREAM (Single-cell Trajectories Reconstruction, Exploration And Mapping) is an interactive pipeline capable of disentangling and visualizing complex branching trajectories from both single-cell transcriptomic and epigenomic data. Within GenePattern STREAM is implemented as a collection of modules that cover the entire STREAM processing pipeline to allow individual steps to be performed interactively for data exploration.
STREAM.FeatureSelection is used to identify features to be used in the downstream analysis. Two types of features can be used;
For transcriptomic data (single-cell RNA-seq or qPCR), the input of STREAM is a gene expression matrix, where rows represent genes, columns represent cells. Each entry contains an adjusted gene expression value (after library size normalization and log2 transformation, typically performed using the STREAM.Preprocessing module).
By default the most variable genes are selected as features. For each gene, its mean value and standard deviation are calculated across all the cells. Then a non-parametric local regression method (LOESS) is used to fit the relationship between mean and standard deviation values. Genes above the curve that diverge significantly are selected as variable genes.
Alternatively, users can also perform PCA on the scaled matrix and select the top principal components based on the variance ratio elbow plot.
H Chen, L Albergante, JY Hsu, CA Lareau, GL Bosco, J Guan, S Zhou, AN Gorban, DE Bauer, MJ Aryee, DM Langenau, A Zinovyev, JD Buenrostro, GC Yuan, L Pinello Single-cell trajectories reconstruction, exploration and mapping of omics data with STREAM. Nature Communications, volume 10, Article number: 1903 (2019)
Nestorowa, S. et al. A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. Blood 128, e20-31 (2016).
Pinello Lab STREAM Github Repository
ADD GPNB NOTEBOOK HERE WHEN READY
Example data for the STREAM workflow can be downloaded from dropbox: Stream Example Data
Ref: Nestorowa, S. et al. A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. Blood 128, e20-31 (2016).
An input file suitable for this step is available at filtered_stream_result.pkl
GenePattern 3.9.11 or later (dockerized).
Inputs and Outputs | |
Name | Description |
---|---|
data file* | A STREAM pkl file containing an annotated AnnData matrix of gene expression data/td> |
output filename* | The output filename prefix. |
Select Variable GenesParameters used if variable genes are to be selected as the feature. | |
Name | Description |
find variable genes | Wether to find variable genes and add them to the output pkl object True/False. |
loess fraction | Between 0 and 1. The fraction of the data used when estimating each y-value in LOWESS function. |
percentile | Between 0 and 100. Specify the percentile to select genes.Genes are ordered based on its distance from the fitted curve. |
num genes | Specify the number of selected genes. Genes are ordered based on its distance from the fitted curve. |
Principal Component AnalysisParameters used if PCA components are to be selected as the feature. | |
Name | Description |
find principal components | Do a principal compnents Analysis (PCA) True/False. |
feature | Choose from the genes in the dataset, Features used for pricipal component analysis. If None, all the genes will be used. IF 'var_genes', the most variable genes obtained from select variable genes will be used. |
num principal components | The number of principal components. |
max principal components | The maximum number of principal components used for variance Ratio plot. |
first principal component | If True, the first principal component will be included. True/False |
use precomputed | If True, the PCA results from previous computing will be used. True/False |
PlottingParameters controlling the output figures. | |
Name | Description |
figure height | Figure height as used in matplotlib graphs. Default=8. |
figure width | Figure width as used in matplotlib plots. Default=8 |
* - required