Author(s): Joe Solvason
Contact: Joe Solvason (solvason@eng.ucsd.edu)
Adapted as a GenePattern Module by: Ted Liefeld (jliefeld@cloud.ucsd.edu)
Task Type: Transciption factor analysis
LSID: urn:lsid:genepattern.org:module.analysis:00443
AnnotateAndVisualizeInSilicoSnvs reports the effects of all possible in silico single-nucleotide variants (SNVs) in a given sequence. Possible SNV effects include increasing (or optimizing) the affinity/score of a binding site, decreasing (or sub-optimizing) the affinity/score of a binding site, deleting a binding site, or creating a binding site.
The in silico SNV analysis is performed on one transcription factor, but the binding sites of multiple different transcription factors can be displayed on the plot. Each binding site is labeled with the TF name and a unique binding site ID. If the relative affinity/score dataset is provided for a transcription factor, the affinity/score of this site will be labeled and the intensity of the binding site’s color will be proportional to the affinity/score.
For every nucleotide in the sequence, all possible SNVs are made. For each SNV, we determine its effect, if any, on any binding sites that exist in the sequence. These are the possible effects of a SNV on a binding site:
inc
dec
denovo
del
If an optimization threshold is provided by the user, then we report only the binding sites that have an increased affinity/score with a fold change greater than or equal to the threshold. Similarly, if a sub-optimization threshold is provided, then we report only the binding sites that have a decreased affinity/score with a fold change less than or equal to the threshold.
Using the list of all identified SNV effects, an image of the sequence is generated and it contains a table of all possible alternate nucleotides. Each cell in the table is colored according to the mutation type of the SNV. If the SNV has no effect, then its background is grey. If a SNV has multiple effects, then its background is white.
To find and plot all putative binding sites, we iterate across every k-mer in the DNA sequence and identify those that conform to the binding site definition for each transcription factor. The user can also choose to plot all denovo binding sites created from SNVs, in addition to existing putative binding sites.
The image can be outputted in one of two ways: (1) zoom into a portion of the sequence or (2) separate the entire sequence into windows. If the sequence is greater than 500 nucleotides in length, the sequence will automatically be separated into windows and outputted as separate files. The maximum size for each window is 500 nucleotides.
* indicates required parameter
defineTFBindingSites.from.PBM or normalized PFM data file from defineTFBindingSites.from.PFM. This is the transcription factor for which in silico SNV analysis will be performed.Default = allinc, dec, denovo, and del. This option also takes the value all if the user would like to analyze all of the listed mutation types.Default = FalseTrue, plot the binding sites that would be created from denovo SNVs, in addition to existing binding sites. If False, only plot existing binding sites.Default = 1Default = 1Zoom indicates the region of the DNA sequence to visualize, given a start and end coordinate, which can be specified using the zoom range option below. Windows will output the entire DNA sequence into separate images. The size of the window, or the number of bases plotted per window, can be specified by the window size option below.Default = Noneoutput image format = WindowsDefault = 500output image format = ZoomDefault = NoneDefault = 200Sequence Name: name of the DNA sequenceSequence: the sequenceSequence Name Sequence
ZRS AACTTTAATGCCTATGTTTGATTTGAAGTCATAGCATAAAAGGTAACATAAGCAACATCCTGACCAATTATCCAAACCATCCAGACATCCCTGAATGGC...
Hand2 CACCACTGGGTGATCCATAGTATGGAATATTTTTATGAGAAACAGCCACATAACATGTACCTGTTAATGTAGGCTTTGTGTTTATTTGCAATAGCAGAG...
PBM Kmer: the sequence of every possible k-merPBM Relative Affinity: the relative affinity of each k-mer normalized to the k-mer with the highest MFIPBM Kmer PBM Relative Affinity
AAAAAAAA 0.15
AAAAAAAC 0.11
AAAAAAAG 0.13
AAAAAAAT 0.13
AAAAAACA 0.12
TF Name: name of the transcription factorBinding Site Definition: minimal IUPAC binding site definition for transcription factorColor: binding site color on the output visualizationPBM Reference Data: relative affinity data obtained from DefineTfSites.from.PBM (optional)PFM Reference Data: relative score data obtained from DefineTfSites.from.PFM (optional)TF Name Binding Site Definition Color PBM Reference Data PFM Reference Data
ETS NNGGAWNN blue input_ets-pbm.tsv
HOX NYNNTNAA gold input_hox-pbm.tsv
HAND CANNTG pink
PBM Kmer: the sequence of every possible k-merPBM Relative Affinity: the relative affinity of each k-mer normalized to the k-mer with the highest MFIPBM Kmer PBM Relative Affinity
AAAAAAAA 0.55
AAAAAAAC 0.56
AAAAAAAG 0.54
AAAAAAAT 0.54
AAAAAACA 0.56
Reference Affinity and Alternate Affinity will instead be labeled Reference Score and Alternate ScoreSequence Name: name of the sequence being analyzedKmer ID: unique ID given to binding siteStart Position (0-indexed): position at which the binding site startsPosition (0-indexed): position of the SNVReference Nucleotide: reference nucleotideAlternate Nucleotide: alternate nucleotideReference Kmer: reference binding siteAlternate Kmer: alternate binding siteSite Direction: direction of the binding site (+ if it follows the given IUPAC or - if it follows the reverse complement of the IUPAC)Reference Affinity: the affinity of the reference binding siteAlternate Affinity: the affinity of the alternate binding siteFold Change: the ratio between Reference Affinity and Alternate AffinitySNV Effect: the type of SNV effectZRS sequence:


Hand2 sequence:


Example input data is available on github