ESPPredictor GenePattern Module

LSID

urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00158

Author(s)

Vincent Fusaro, Broad Institute, wrapped as a module by Ted Liefeld, Mesirov Lab, UCSD School of Medicine.

Contact(s)

Algorithm and scientific questions: GenePattern Forum

Module wrapping issues: Ted Liefeld < jliefeld at cloud dot ucsd dot edu>

Note that this module is being provided As-Is by the GenePattern team and is no longer actively supported by Vincent Fusaro, who is no longer at the Broad Institute.

Introduction

The Enhance Signature Peptide (ESP) predictor is a computational model to predict high responding peptides (i.e., peptides with a high intensity) from a given protein in ESI-MS. A feature set consisting of 550 physicochemical properties is calculated for each peptide. The feature set is then analyzed with a Random Forest (RF) model to calculate the probability of high response for each peptide. It is important to note that the probability of high response is on a per protein basis and is relative to other peptides within the same protein. The probability can be used to rank peptides in order of their response in order to select the highest responding peptides.

References

Vincent A. Fusaro, D.R. Mani, Jill P. Mesirov, Steven A. Carr. Computational Prediction of High Responding Peptides for Development of Targeted Protein Assays by Mass Spectrometry. Nature Biotechnology (2009).

Tool Description

The ESPPredictor module requires a list of peptide sequences. When starting with protein sequences they can be digested in silico using Peptide Selector. We tested the ESP predictor using the following settings:

Digest: trypsin (Note: not tested with any other enzyme)
Maximum # basic residues: 4
Minimum peptide MH+: 600
Maximum peptide MH+: 2800
Clear all “Peptide exclusion criteria” checkboxes
Delete amino acids from “AA Composition Filtering”

You must save the output (copy & paste usually into Excel) and then save peptide sequences as a separate text file. This text file can be used as input into the ESPPredictor module.

Requirements

GenePattern 3.9.11 or later (dockerized).

Language (included in Docker image): Matlab (bioinformatics toolbox), R (Random Forest Library)

Parameters

Inputs

Name	Description
input.file	A list of tryptic peptide sequences. One sequence per line. Exclude the following non-standard amino acids: J, U, Z, B, O, X.

Outputs

Name	Description
Predictions.txt	A list of peptide sequences with their associated predicted probability of high response.
PeptideFeatureSet.csv	A peptide feature file that contains 550 physicochemical properties for each peptide. The ESPPredictor module uses this file as input to the Random Forest model.

Note: Depending on the number of peptide sequences the module may execute in a few seconds (<20 peptides) or many hours (>1,000 peptides).

Example Input

Click to download or copy below

AYLETEIK
ANFQGAITNR
LAFTGSTEVGK
TVGAALTNDPR
NAGQICSSGSR
LHFDTAEPVK

Example output (Prediction.txt)

Sequence	ESP_Prediction
AYLETEIK	0.44658
ANFQGAITNR	0.77478
LAFTGSTEVGK	0.79398
TVGAALTNDPR	0.9486
LHFDTAEPVK	0.63772

Version Comments

Version	Release Date	Description
4	2020-05-01	Dockerized release
3	2010-10-29	Updated to use MATLAB version 2010a

Module Documentation

ESPPredictor