Algorithm and scientific questions: <Huidong.Chen at mgh dot harvard dot edu>
Module wrapping issues: Ted Liefeld < jliefeld at cloud dot ucsd dot edu>
STREAM (Single-cell Trajectories Reconstruction, Exploration And Mapping) is an interactive pipeline capable of disentangling and visualizing complex branching trajectories from both single-cell transcriptomic and epigenomic data. Within GenePattern STREAM is implemented as a collection of modules that cover the entire STREAM processing pipeline to allow individual steps to be performed interactively for data exploration.
STREAM.Preprocess is used to normalize and filter single-cell transcriptomic data and format it for analsysis using the STREAM piplene.
To prepare for processing in the follow-on STREAM modules, typically we will will first normalize the raw gene expression values based on library size. Then the gene expression values will be logarithmized. The mitochondrial genes will be removed.
With this module, we can filter out cells based on several cell-centric metrics, including the minimum number of genes expressed, the minimum percentage of genes expressed, and the minimum number of read counts for one cell.
We can also filter out genes based on gene-centric metrics, including the minimum number of cells expressing one gene, the minimum percentage of cells expressing one gene, and the minimum number of read counts for one gene.
H Chen, L Albergante, JY Hsu, CA Lareau, GL Bosco, J Guan, S Zhou, AN Gorban, DE Bauer, MJ Aryee, DM Langenau, A Zinovyev, JD Buenrostro, GC Yuan, L Pinello Single-cell trajectories reconstruction, exploration and mapping of omics data with STREAM. Nature Communications, volume 10, Article number: 1903 (2019)
Nestorowa, S. et al. A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. Blood 128, e20-31 (2016).
Pinello Lab STREAM Github Repository
ADD GPNB NOTEBOOK HERE WHEN READY
Example data can be downloaded from dropbox: Stream Example Data
Ref: Nestorowa, S. et al. A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. Blood 128, e20-31 (2016).
GenePattern 3.9.11 or later (dockerized).
Inputs and Outputs | |
Name | Description |
---|---|
data file* | A STREAM pkl file containing an annotated AnnData matrix of gene expression data/td> |
cell label file | A tsv file containing cell labels. |
cell label color file | A tsv file containing cell label colors |
output filename* | The output filename prefix. |
Cell Filtering | |
Name | Description |
min percent genes | The minimum percentage of genes expressed to keep a cell. |
min count genes | The minimum number of read counts for each gene. |
Gene Filtering | |
Name | Description |
min num cells | The minimum number of cells expressing a gene. |
min percent cells | The minimum percentage of cells expressing a gene to keep a gene. |
min count cells | The minimum number of read counts for one cell. |
Other Preprocessing | |
Name | Description |
expression cutoff < /td> | The expression cutoff used to determine if a gene is expressed. If expression is greater than expr_cutoff,the gene is considered 'expressed'. |
normalize | Normalize the data, True/False |
log transform | Log transform the dataset, True/False |
remove mitochondrial genes | Remove mitochondrial genes, True/False |
* - required