Author(s): Joe Solvason
Contact: Joe Solvason (solvason@eng.ucsd.edu)
Adapted as a GenePattern Module by: Ted Liefeld (jliefeld@cloud.ucsd.edu)
Task Type: Transciption factor analysis
LSID: urn:lsid:genepattern.org:module.analysis:00443
AnnotateAndVisualizeInSilicoSnvs
reports the effects of all possible in silico single-nucleotide variants (SNVs) in a given sequence. Possible SNV effects include increasing (or optimizing) the affinity/score of a binding site, decreasing (or sub-optimizing) the affinity/score of a binding site, deleting a binding site, or creating a binding site.
The in silico SNV analysis is performed on one transcription factor, but the binding sites of multiple different transcription factors can be displayed on the plot. Each binding site is labeled with the TF name and a unique binding site ID. If the relative affinity/score dataset is provided for a transcription factor, the affinity/score of this site will be labeled and the intensity of the binding site’s color will be proportional to the affinity/score.
For every nucleotide in the sequence, all possible SNVs are made. For each SNV, we determine its effect, if any, on any binding sites that exist in the sequence. These are the possible effects of a SNV on a binding site:
inc
dec
denovo
del
If an optimization threshold is provided by the user, then we report only the binding sites that have an increased affinity/score with a fold change greater than or equal to the threshold. Similarly, if a sub-optimization threshold is provided, then we report only the binding sites that have a decreased affinity/score with a fold change less than or equal to the threshold.
Using the list of all identified SNV effects, an image of the sequence is generated and it contains a table of all possible alternate nucleotides. Each cell in the table is colored according to the mutation type of the SNV. If the SNV has no effect, then its background is grey. If a SNV has multiple effects, then its background is white.
To find and plot all putative binding sites, we iterate across every k-mer in the DNA sequence and identify those that conform to the binding site definition for each transcription factor. The user can also choose to plot all denovo binding sites created from SNVs, in addition to existing putative binding sites.
The image can be outputted in one of two ways: (1) zoom into a portion of the sequence or (2) separate the entire sequence into windows. If the sequence is greater than 500 nucleotides in length, the sequence will automatically be separated into windows and outputted as separate files. The maximum size for each window is 500 nucleotides.
* indicates required parameter
defineTFBindingSites.from.PBM
or normalized PFM data file from defineTFBindingSites.from.PFM
. This is the transcription factor for which in silico SNV analysis will be performed.Default = all
inc
, dec
, denovo
, and del
. This option also takes the value all
if the user would like to analyze all of the listed mutation types.Default = False
True
, plot the binding sites that would be created from denovo SNVs, in addition to existing binding sites. If False
, only plot existing binding sites.Default = 1
Default = 1
Zoom
indicates the region of the DNA sequence to visualize, given a start and end coordinate, which can be specified using the zoom range
option below. Windows
will output the entire DNA sequence into separate images. The size of the window, or the number of bases plotted per window, can be specified by the window size
option below.Default = None
output image format = Windows
Default = 500
output image format = Zoom
Default = None
Default = 200
Sequence Name:
name of the DNA sequenceSequence:
the sequenceSequence Name Sequence
ZRS AACTTTAATGCCTATGTTTGATTTGAAGTCATAGCATAAAAGGTAACATAAGCAACATCCTGACCAATTATCCAAACCATCCAGACATCCCTGAATGGC...
Hand2 CACCACTGGGTGATCCATAGTATGGAATATTTTTATGAGAAACAGCCACATAACATGTACCTGTTAATGTAGGCTTTGTGTTTATTTGCAATAGCAGAG...
PBM Kmer:
the sequence of every possible k-merPBM Relative Affinity:
the relative affinity of each k-mer normalized to the k-mer with the highest MFIPBM Kmer PBM Relative Affinity
AAAAAAAA 0.15
AAAAAAAC 0.11
AAAAAAAG 0.13
AAAAAAAT 0.13
AAAAAACA 0.12
TF Name:
name of the transcription factorBinding Site Definition:
minimal IUPAC binding site definition for transcription factorColor:
binding site color on the output visualizationPBM Reference Data:
relative affinity data obtained from DefineTfSites.from.PBM
(optional)PFM Reference Data:
relative score data obtained from DefineTfSites.from.PFM
(optional)TF Name Binding Site Definition Color PBM Reference Data PFM Reference Data
ETS NNGGAWNN blue input_ets-pbm.tsv
HOX NYNNTNAA gold input_hox-pbm.tsv
HAND CANNTG pink
PBM Kmer:
the sequence of every possible k-merPBM Relative Affinity:
the relative affinity of each k-mer normalized to the k-mer with the highest MFIPBM Kmer PBM Relative Affinity
AAAAAAAA 0.55
AAAAAAAC 0.56
AAAAAAAG 0.54
AAAAAAAT 0.54
AAAAAACA 0.56
Reference Affinity
and Alternate Affinity
will instead be labeled Reference Score
and Alternate Score
Sequence Name:
name of the sequence being analyzedKmer ID:
unique ID given to binding siteStart Position (0-indexed):
position at which the binding site startsPosition (0-indexed):
position of the SNVReference Nucleotide:
reference nucleotideAlternate Nucleotide:
alternate nucleotideReference Kmer:
reference binding siteAlternate Kmer:
alternate binding siteSite Direction:
direction of the binding site (+ if it follows the given IUPAC or - if it follows the reverse complement of the IUPAC)Reference Affinity:
the affinity of the reference binding siteAlternate Affinity:
the affinity of the alternate binding siteFold Change:
the ratio between Reference Affinity
and Alternate Affinity
SNV Effect:
the type of SNV effectZRS sequence:
Hand2 sequence:
Example input data is available on github