Module Documentation

STREAM.DetectTransitionGenes


LSID
urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00404
Author(s)
Huidong Chen, Massachussets General Hospital, wrapped as a module by Ted Liefeld, Mesirov Lab, UCSD School of Medicine.
Contact(s)

Algorithm and scientific questions: <Huidong.Chen  at mgh dot harvard dot edu>

Module wrapping issues:  Ted Liefeld  < jliefeld at cloud dot ucsd dot edu>


Introduction

STREAM (Single-cell Trajectories Reconstruction, Exploration And Mapping) is an interactive pipeline capable of disentangling and visualizing complex branching trajectories from both single-cell transcriptomic and epigenomic data. Within GenePattern STREAM is implemented as a collection of modules that cover the entire STREAM processing pipeline to allow individual steps to be performed interactively for data exploration.

STREAM.DetectTransitionGenes is used to detect marker genes for each transition.

Algorithm

For each branch Bi and for each gene E we first scale the gene expression values to [0,1] for convenience. Then we check if the candidate gene has a reasonable dynamic range considering cells close to the start and end points. To this end, we consider the fold change in average gene expressions of the first 20% and the last 80% of the cells based on the inferred pseudotime. If the difference is greater than a specified threshold (the default log2 fold change value is 0.25), we then calculate Spearman’s rank correlation between inferred pseudotime and gene expression of all the cells along Bi. Genes with Spearman’s correlation coefficient above a specified threshold (0.4 by default) are identified and reported as transition genes.

References

H Chen, L Albergante, JY Hsu, CA Lareau, GL Bosco, J Guan, S Zhou, AN Gorban, DE Bauer, MJ Aryee, DM Langenau, A Zinovyev, JD Buenrostro, GC Yuan, L Pinello Single-cell trajectories reconstruction, exploration and mapping of omics data with STREAM. Nature Communications, volume 10, Article number: 1903 (2019)

Nestorowa, S. et al. A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. Blood 128, e20-31 (2016).

Pinello Lab   STREAM Github Repository

Input Files

  1. data file *
    A STREAM pkl file containing an annotated AnnData matrix of gene expression data.

Output Files

  1. <output filename>_stream_result.pkl
    Output file in STREAM AnnData extended pickle (.pkl) file format suitable for passing to the next step of the STREAM analysis.
  2. transition_genes/transition_genes_S#_S#.tsv
    The transition genes for the specific branching. Columns include gene name, stat, log fold change, pval and qval.
  3. transition_genes_S#_S#.pdf
    Bar plot showing Spearman correlation coefficient for genes between 2 branches..

Example Data

Example data for the STREAM workflow can be downloaded from dropbox: Stream Example Data
Ref: Nestorowa, S. et al. A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. Blood 128, e20-31 (2016).

Example data for this specific step can be found at stream_epg_result.pkl

Requirements

GenePattern 3.9.11 or later (dockerized).

Parameters

Inputs and Outputs

Name Description
data file* A STREAM pkl file containing an annotated AnnData matrix of gene expression data/td>
output filename* The output filename prefix.

Transition Gene Detection

Parameters used if variable genes are to be selected as the feature.
Name Description
root The starting node.
preference The preference of nodes. The branch with speficied nodes are preferred and put on the top part of subway plot. The higher ranks the node have, the closer to the top the branch with that node is. e.g. S3,S4.
percentile expr* Between 0 and 100. Between 0 and 100. Specify the percentile of gene expression greater than 0 to filter out some extreme gene expressions.
use precomputed* If True, the previously computed scaled gene expression will be used.
cutoff zscore* The z-score cutoff used for mean values of all leaf branches.
cutoff pvalue The p value cutoff used for Kruskal-Wallis H-test and post-hoc pairwise Conover's test.

Plotting

Parameters controlling the output figures.
Name Description
num genes The number of genes to plot
figure height/td> Figure height as used in matplotlib graphs. Default=8.
figure width Figure width as used in matplotlib plots. Default=8

* - required