Algorithm and scientific questions: <Huidong.Chen at mgh dot harvard dot edu>
Module wrapping issues: Ted Liefeld < jliefeld at cloud dot ucsd dot edu>
STREAM (Single-cell Trajectories Reconstruction, Exploration And Mapping) is an interactive pipeline capable of disentangling and visualizing complex branching trajectories from both single-cell transcriptomic and epigenomic data. Within GenePattern STREAM is implemented as a collection of modules that cover the entire STREAM processing pipeline to allow individual steps to be performed interactively for data exploration.
STREAM.DetectDifferentiallyExpressedGenes is used to detect differentially expressed genes between pairs of branches.
For each pair of branches π΅π and π΅π , and for the gene E, the gene expression values across cells from both branches are scaled to the range [0,1]. For gene expression πΈπ from π΅π and gene expression πΈπ from π΅π , we first calculate their mean values. Then, we check the fold change between mean values to make sure it is above a specified threshold (the default log2 fold change value is >0.25). MannβWhitney U test is then used to test whether πΈπ is greater than πΈπ or πΈπ is less than πΈπ . Since the statistic U could be approximated by a normal distribution for large samples, and U depends on specific datasets, we standardize Uto Z-score to make it comparable between different datasets. For small samples where this test is underpowered (<20 cells per branch), we report only the fold change to qualitatively evaluate the difference between πΈπ and πΈπ . Genes with Z-score or fold change greater than the specified threshold (2.0 by default) are considered as differentially expressed genes between two branches. Formally: π§=1+πβππ(ππ) Where ππ , ππ are the mean and standard deviation, and ππ=ππππ2 ππ=ππππ12β―β―β―β―β―β―β―β―β((π+1)ββπ=1ππ‘3πβπ‘ππ(πβ1) Where π=ππ+ππ ππ , ππ are the number of cells in each branch, π‘π is the number of cells sharing rank π and π is the number of distinct ranks.
H Chen, L Albergante, JY Hsu, CA Lareau, GL Bosco, J Guan, S Zhou, AN Gorban, DE Bauer, MJ Aryee, DM Langenau, A Zinovyev, JD Buenrostro, GC Yuan, L Pinello Single-cell trajectories reconstruction, exploration and mapping of omics data with STREAM. Nature Communications, volume 10, Article number: 1903 (2019)
Nestorowa, S. et al. A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. Blood 128, e20-31 (2016).
Pinello Lab STREAM Github Repository
Example data for the STREAM workflow can be downloaded from dropbox: Stream Example Data
Ref: Nestorowa, S. et al. A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. Blood 128, e20-31 (2016).
Example data for this specific step can be found at stream_epg_result.pkl
GenePattern 3.9.11 or later (dockerized).
Inputs and Outputs | |
Name | Description |
---|---|
data file* | A STREAM pkl file containing an annotated AnnData matrix of gene expression data/td> |
output filename* | The output filename prefix. |
Select Variable GenesParameters used if variable genes are to be selected as the feature. | |
Name | |
Differential Expression | |
Name | Description |
root | The starting node. |
preference | The preference of nodes. The branch with speficied nodes are preferred and put on the top part of subway plot. The higher ranks the node have, the closer to the top the branch with that node is. e.g. S3,S4. |
percentil expr | Between 0 and 100. Between 0 and 100. Specify the percentile of gene expression greater than 0 to filter out some extreme gene expressions. |
use precomputed | If True, the previously computed scaled gene expression will be used. |
cutoff zscore | The z-score cutoff used for Mann - Whitney U test. |
cutoff logfc | The log-transformed fold change cutoff between a pair of branches. |
PlottingParameters controlling the output figures. | |
Name | Description |
num genes | The number of genes to plot. |
figure height | Figure height as used in matplotlib graphs. Default=8. |
figure width | Figure width as used in matplotlib plots. Default=8 |
* - required