Author(s): Joe Solvason, Simran Jandu
Contact: Joe Solvason (solvason@ucsd.edu)
Adapted as a GenePattern Module by: Ted Liefeld (jliefeld@cloud.ucsd.edu)
Task Type: Transciption factor analysis
LSID: urn:lsid:genepattern.org:module.analysis:00441
NormalizeTfDnaAffinityData
generates a relative affinity dataset which can then be used in other TFSites modules to score binding sites. This tool normalizes a raw affinity dataset relative to the sequence with the highest value that follows the core binding site definition. The resulting dataset will report the relative affinity value for every sequence in the original dataset, ranging from 0 to 1.
There is a wide range of experimental techniques that can be used to generate affinity datasets for scoring binding sites. This function can normalize any affinity dataset that has a corresponding value for each sequence. For example, raw PBM data for a transcription factor can be downloaded from uniPROBE. The user must indicate the columns that contain the DNA sequences and the raw affinity values. The user must also define the minimal binding site using IUPAC nomenclature (i.e. N = ATGC, W = AT, etc). The tool searches for the k-mer with the largest value that follows the IUPAC binding site definition. For all other k-mers, their value will be normalized relative to the value of this k-mer and the resulting value is called the relative affinity. Therefore, the k-mer with the maximum value will have a relative affinity of 1.0. The normalization calculation for each sequence is: relative affinity = (value) / (value of the maximum IUPAC k-mer). For example, a relative affinity value of 0.1 is 10% of the maximum value.
* indicates required parameter
True
, a header exists in the input file. If False
, no header exists.Default = False
.svg
in addition to .png
. For manuscript preparation, .svg
format is preferable.8-mer:
the sequence of every forward k-merMedian:
the median fluorescence intensity (raw affinity) of the k-mer8-mer 8-mer E-score Median Z-score
AAAAAAAA TTTTTTTT 0.29130 2871.60 3.5965
AAAAAAAC TTTTTTTG 0.10748 2086.00 0.3958
AAAAAAAG TTTTTTTC 0.23656 2539.91 2.3673
AAAAAAAT TTTTTTTA 0.21760 2434.82 1.9442
AAAAAACA TTTTTTGT 0.19839 2407.46 1.8310
Kmer:
the sequence of every k-merRelative Affinity:
the relative affinity of each k-mer normalized to the k-mer with the highest raw affinityKmer Relative Affinity
AAAAAAAA 0.15
AAAAAAAC 0.11
AAAAAAAG 0.13
AAAAAAAT 0.13
AAAAAACA 0.12
histogram of relative affinities (.png)
Example input data is available at here.