Algorithm and scientific questions: <Huidong.Chen at mgh dot harvard dot edu>
Module wrapping issues: Ted Liefeld < jliefeld at cloud dot ucsd dot edu>
STREAM (Single-cell Trajectories Reconstruction, Exploration And Mapping) is an interactive pipeline capable of disentangling and visualizing complex branching trajectories from both single-cell transcriptomic and epigenomic data. Within GenePattern STREAM is implemented as a collection of modules that cover the entire STREAM processing pipeline to allow individual steps to be performed interactively for data exploration.
STREAM.DetectDifferentiallyExpressedGenes is used to detect differentially expressed genes between pairs of branches.
For each pair of branches 𝐵𝑖 and 𝐵𝑗 , and for the gene E, the gene expression values across cells from both branches are scaled to the range [0,1]. For gene expression 𝐸𝑖 from 𝐵𝑖 and gene expression 𝐸𝑗 from 𝐵𝑗 , we first calculate their mean values. Then, we check the fold change between mean values to make sure it is above a specified threshold (the default log2 fold change value is >0.25). Mann–Whitney U test is then used to test whether 𝐸𝑖 is greater than 𝐸𝑗 or 𝐸𝑖 is less than 𝐸𝑗 . Since the statistic U could be approximated by a normal distribution for large samples, and U depends on specific datasets, we standardize Uto Z-score to make it comparable between different datasets. For small samples where this test is underpowered (<20 cells per branch), we report only the fold change to qualitatively evaluate the difference between 𝐸𝑖 and 𝐸𝑗 . Genes with Z-score or fold change greater than the specified threshold (2.0 by default) are considered as differentially expressed genes between two branches. Formally: 𝑧=1+𝑈−𝑚𝑈(𝜎𝑈) Where 𝑚𝑈 , 𝜎𝑈 are the mean and standard deviation, and 𝑚𝑈=𝑛𝑖𝑛𝑗2 𝜎𝑈=𝑛𝑖𝑛𝑗12⎯⎯⎯⎯⎯⎯⎯⎯√((𝑛+1)−∑𝑙=1𝑘𝑡3𝑙−𝑡𝑙𝑛(𝑛−1) Where 𝑛=𝑛𝑖+𝑛𝑗 𝑛𝑖 , 𝑛𝑗 are the number of cells in each branch, 𝑡𝑖 is the number of cells sharing rank 𝑙 and 𝑘 is the number of distinct ranks.
H Chen, L Albergante, JY Hsu, CA Lareau, GL Bosco, J Guan, S Zhou, AN Gorban, DE Bauer, MJ Aryee, DM Langenau, A Zinovyev, JD Buenrostro, GC Yuan, L Pinello Single-cell trajectories reconstruction, exploration and mapping of omics data with STREAM. Nature Communications, volume 10, Article number: 1903 (2019)
Nestorowa, S. et al. A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. Blood 128, e20-31 (2016).
Pinello Lab STREAM Github Repository
Example data for the STREAM workflow can be downloaded from dropbox: Stream Example Data
Ref: Nestorowa, S. et al. A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. Blood 128, e20-31 (2016).
Example data for this specific step can be found at stream_epg_result.pkl
GenePattern 3.9.11 or later (dockerized).
Inputs and Outputs
|data file*||A STREAM pkl file containing an annotated AnnData matrix of gene expression data/td>|
|output filename*||The output filename prefix.|
Select Variable GenesParameters used if variable genes are to be selected as the feature.
|root||The starting node.|
|preference||The preference of nodes. The branch with speficied nodes are preferred and put on the top part of subway plot. The higher ranks the node have, the closer to the top the branch with that node is. e.g. S3,S4.|
|percentil expr||Between 0 and 100. Between 0 and 100. Specify the percentile of gene expression greater than 0 to filter out some extreme gene expressions.|
|use precomputed||If True, the previously computed scaled gene expression will be used.|
|cutoff zscore||The z-score cutoff used for Mann - Whitney U test.|
|cutoff logfc||The log-transformed fold change cutoff between a pair of branches.|
PlottingParameters controlling the output figures.
|num genes||The number of genes to plot.|
|figure height||Figure height as used in matplotlib graphs. Default=8.|
|figure width||Figure width as used in matplotlib plots. Default=8|
* - required