MS2DB++ logo
SFSU Bioinformatics Logo

Disulfide (S-S) bonds constitute one of the most important cross-linkages in proteins and have significant influence on the protein structure and function. At the state of the art, various methodological frameworks have been proposed for identification of disulfide bonds. These include mass spectrometry-based methods, sequence-based predictive approaches, as well as techniques like crystallography and NMR. Each of these frameworks has its advantages and disadvantages in terms of applicability, throughput, and accuracy.

For instance, NMR and crystallography require relatively large amounts (10 to 100 mg) of pure protein in a particular solution or crystalline state, can be limited by protein size, and are fundamentally low-throughput. Sequence-based predictive models, once developed do not require significant data preparation and can be run in high-throughput settings. However, their disadvantage lies in the fact that it may not always be possible to obtain an accurate mapping between local or global features and the presence of specific disulfide bonds. For sequence-based methods, difficulties can also arise if the test samples have high sequence homology with the training set but weaker structural homology.

MS2DB++

Finally, mass spectrometry (MS)-based methods can be applied under conditions of either partial reduction or non-reduction of the protein to detect S-S bonds. While MS-based methods are highly accurate and increasingly being used, they too have limitations. For instance, ambiguous results can occur under conditions of partial reduction if the S-S bonds have similar reduction rates. Under non-reduction conditions on the other hand, S-S bonds can be missed for molecules that have multiple S-S bonds or large number of cysteines. Furthermore, the fragmentation model used in the algorithms for interpreting MS-data can also have limitations causing too few product ions to be generated/accounted which can lead to errors in bond determination. Thus, at the state-of-the-art no single method is guaranteed to work under all conditions. Furthermore, the results from different methods may concur or conflict in parts. MS2DB++ is designed to address these challenges.

MS2DB++ is a web application for determining the disulfide connectivity in proteins using an information-fusion approach based on Dempster-Shafer theory (DST). The software provides different methodological frameworks for determining S-S bonds and combines the results automatically and rigorously using DST. This fundamentally novel approach allows MS2DB++ to achieve high sensitivity, specificity and accuracy. It not only outperforms the constituent methods, but also outperforms other software in the area, such as MassMatrix, MS2Assign, DISULFIND, and PreCys. MS2DB++ also provides an easy to use interface, which breaks down the disulfide bond determination process into clear, simple steps.

The constituent S-S bond determination frameworks encoded as part of MS2DB++ include:

  • An earlier MS-based method developed by us called MS2DB+. MS2DB+ identifies, in polynomial time, the disulfide linkages in proteins using tandem mass spectrometry data. It uses an efficient approximation algorithm which allows the consideration of multiple ion types in the analysis of MS/MS data (up to 12 different ion types).
  • A sequence-based predictor using a Support Vector Machine (SVM) classifier.
  • A cysteines separation profiles-based (CSP) search method. This pattern-wise approach seeks to match the disulfide connectivity pattern of a protein against a database of CSPs in order to find the most resemble pattern. Once the best match is found (lowest divergence between two CSPs), the S-S connectivity of the protein is inferred based on the disulfide bonds previously annotated for the matched CSP.

MS2DB++ also allows users to integrate results from up to two other (arbitrary) external methods. Lastly, MS2DB++ allows users to enter tandem mass spectrometry data in many different formats (i.e. mzXML, mzML, mzData or Sequest DTA files). The protein sequence information is entered in the well-established FASTA format. MS2DB++ supports multiple methods of information fusion based on DST. The S-S connectivity determined using each of these methods is presented to the user in a graphical easy-to-read format. The results are also made available in TXT and XML formats. A through description of each combination rule is presented in the help section. The most up-to-date source code is available here.