Author(s): Joe Solvason
Contact: Joe Solvason (solvason@ucsd.edu)
Adapted as a GenePattern Module by: Ted Liefeld (jliefeld@cloud.ucsd.edu)
Task Type: Transciption factor analysis
LSID: urn:lsid:genepattern.org:module.analysis:00443
FindTfSitesAlteredBySequenceVariation
reports the effects of all possible in silico single-nucleotide variants (SNVs) in a given sequence, for one transcription factor. Possible SNV effects include increasing (or optimizing) the affinity/score of a binding site, decreasing the affinity/score of a binding site, deleting a binding site, or creating a binding site.
For every nucleotide in the sequence, all possible SNVs are made. For each SNV, we determine its effect, if any, on any binding sites that exist in the sequence. These are the possible effects of a SNV on a binding site:
inc
dec
denovo
del
If an affinity optimization threshold is provided by the user, then we report only the binding sites that have an increased affinity/score with a fold change greater than or equal to the threshold. Similarly, if an affinity reduction threshold is provided, then we report only the binding sites that have a decreased affinity/score with a fold change less than or equal to the threshold.
Using the list of all identified SNV effects, an image of the sequence is generated and it displays all possible alternate nucleotides. The background of each nucleotide is colored according to the mutation type of the SNV. If the SNV has no effect, then its background is blank. If a SNV has multiple effects, then its background will be split into multiple colors. The intensity of the background color is determined by the following options: (1) magnitude of the affinity/score fold change, if the SNV effect is inc
or dec
, (2) magnitude of the alternate k-mer’s affinity/score, if the SNV effect is denovo
, or (3) full intensity, if the SNV effect is del
.
To find putative binding sites, we iterate across every k-mer in the DNA sequence. If using PBM data, we identify the k-mers that conform to the binding site definition for each transcription factor. If using PWM data, we can also use a binding site definition but it is not required. If a site definition is not provided for PWM data, we use the PWM minimum score to define a predicted binding site. The user can also choose to plot all denovo binding sites created from SNVs, in addition to existing putative binding sites.
If the user wishes to analyze only a portion of the sequence, then a zoom range can be specified. If the sequence is greater than 500 nucleotides in length, the sequence will automatically be separated into 500-bp windows and outputted as separate files. In addition, the individual files will be appended together to create a single output file with the entire sequence. The user can also choose to output the files in .svg
format in addition to .png
.
* indicates required parameter
Default = None
Default = None
Default = None
Default = 0.7
Default = False
.svg
in addition to .png
. For manuscript preparation, .svg
format is preferable.Default = all
inc
, dec
, denovo
, and del
. This option also takes the value all
if the user would like to analyze all of the listed mutation types.Default = 1
Default = 1
Default = 150
.svg
files.Default = None
Sequence Name:
name of the DNA sequenceSequence:
the sequenceSequence Name Sequence
ZRS AACTTTAATGCCTATGTTTGATTTGAAGTCATAGCATAAAAGGTAACATAAGCAACATCCTGACCAATTATCCAAACCATCCAGACATCCCTGAATGGC...
ETS
PBM Kmer PBM Relative Affinity
AAAAAAAA 0.15
AAAAAAAC 0.11
AAAAAAAG 0.13
AAAAAAAT 0.13
AAAAAACA 0.12
Sequence Name:
name of the sequence being analyzedKmer ID:
unique ID given to binding siteSNV Position (0-indexed):
position of the SNVReference Nucleotide:
reference nucleotideAlternate Nucleotide:
alternate nucleotideStart Position (1-indexed):
position at which the k-mer starts, where counting begins at oneEnd Position (1-indexed):
position at which the k-mer ends, where counting begins at oneReference Kmer:
reference k-merAlternate Kmer:
alternate k-merSite Direction:
direction of the binding siteReference Value:
the affinity/score of the reference binding siteAlternate Value:
the affinity/score of the alternate binding siteFold Change:
the ratio between Reference Value
and Alternate Value
SNV Effect:
the type of SNV effectSequence Name TF Name Kmer ID Kmer Start Position (1-indexed) End Position (1-indexed) Ref Data Type Value Site Direction Duplicate Kmer IDs
ZRS ETS ETS:1 CTATCCTG 335 328 Affinity 0.15 -
ZRS ETS ETS:2 TTTTCCCC 432 425 Affinity 0.14 - ETS:1,ETS:20
ZRS HOX HOX:1 TTTAATAT 323 316 Affinity 0.75 -
ZRS HOX HOX:2 TTTATGAC 415 408 Affinity 0.84 -
ZRS HAND HAND:1 CAGATG 416 421
Example input data is available here.