tfsites.DefineTfBindingSitesFromPBM

tfsites.DefineTFBindingSites.from.PBM v1

Author(s): Joe Solvason

Contact: Joe Solvason (solvason@eng.ucsd.edu)

Adapted as a GenePattern Module by: Ted Liefeld (jliefeld@cloud.ucsd.edu)

Task Type: Transciption factor analysis

LSID: urn:lsid:genepattern.org:module.analysis:00441

Introduction

DefineTFBindingSites.from.PBM normalizes the median fluorescence intensity (MFI) values obtained from protein-binding microarray (PBM) data for a transcription factor of interest. The k-mer with the maximum MFI that conforms to the binding site definition is normalized to 1.0 and all other k-mers are normalized relative to that MFI value. For example, a normalized value of 0.1 is 10% of the maximum MFI.

Methodology

The raw PBM dataset for a transcription factor is downloaded from uniPROBE and the user indicates the columns of the forward k-mer and the MFI. The user also defines the minimal binding site using IUPAC nomenclature (ie, N = ATGC, W = AT, etc). The tool searches for the k-mer with the largest MFI signal that conforms to the IUPAC binding site definition. For all other k-mers, the MFI signal will be normalized relative to the MFI signal of the maximum k-mer and the resulting value is called the relative affinity. Therefore, the k-mer with the maximum MFI signal will have a relative affinity of 1.0. The normalization calculation for each k-mer is: relative affinity = (MFI signal) / (MFI signal of the maximum IUPAC k-mer).

Parameters

* indicates required parameter

Inputs and Outputs

Other Parameters

Warnings Printed:

  1. If there exists another k-mer that conforms to the binding site definition and has a higher MFI than one provided by the user as define highest relative affinity sequence.
  2. If there are any k-mers that do not conform to the binding site definition but have a MFI greater than the k-mer provided by the user. The affinities of these k-mers will be capped at 1.0.

Input Files

  1. raw PBM input (.tsv)
    • Columns
      • 8-mer: every possible forward k-mer sequence with length k
      • 8-mer: the reverse complement of the forward k-mer
      • E-score: the enrichment score of the k-mer
      • Median: the median fluorescence intensity of the k-mer
      • Z-score: the z-score of the k-mer
8-mer        8-mer        E-score     Median      Z-score
AAAAAAAA     TTTTTTTT     0.29130     2871.60     3.5965
AAAAAAAC     TTTTTTTG     0.10748     2086.00     0.3958
AAAAAAAG     TTTTTTTC     0.23656     2539.91     2.3673
AAAAAAAT     TTTTTTTA     0.21760     2434.82     1.9442
AAAAAACA     TTTTTTGT     0.19839     2407.46     1.8310

Output Files

  1. normalized PBM data (.tsv)
    • Columns
      • PBM Kmer: the sequence of every possible k-mer
      • PBM Relative Affinity: the relative affinity of each k-mer normalized to the k-mer with the highest MFI
PBM Kmer     PBM Relative Affinity
AAAAAAAA     0.15
AAAAAAAC     0.11
AAAAAAAG     0.13
AAAAAAAT     0.13
AAAAAACA     0.12
  1. histograms of relative affinities (.png)
    • Histogram plots
      • All relative affinity values
      • Relative affinity values for the sequences that follow the TF binding site definition
      • Relative affinity values for the sequences that don’t follow the TF binding site definition

Example Data

Example input data is available on github

Version Comments