MS2DBp logo
Efficient Disulfide Bond Determination Using Mass Spectrometry
SFSU Logo

  1. What are the unique features of MS2DB+?
  2. Does it provide support to multiple file types?
  3. Is MS2DB+ an open source application?
  4. Why is MS2DB+ so effective?
  5. How MS2DB+ determines a global disulfide topology?
  6. Are MS2DB+ results affected by the specifics of the different mass spectrometers?
  7. Does MS2DB+ support different fragmentation methods?
  8. Is the dataset large enough to validate the algorithm?
  9. How well MS2DB+ performs when compared to other gold standard methods in the area?
  10. Does MS2DB+ provide access to intermediary results?

1. What are the unique features of MS2DB+?

MS2DB+ uses a rich fragmentation model. To the best of our knowledge, MS2DB+ is the first publicly available software (and algorithmic work) that analyzes multiple (twelve) ion types (a, a*, ao, b*, bo, b, c, x, y*, yo, y, z) in MS/MS data for determining disulfide (S-S) bonds.

From an algorithmic perspective, MS2DB+ is also the first work that solves the problem of generating and matching theoretical spectra with experimental spectra in polynomial time. This is possible because the method only generates the few (theoretical) disulfide bonded peptide configurations whose mass is close to the (given) experimental spectra. This is distinct from the exhaustive approach of generating all possible peptide combinations and subsequently testing and discarding most of these (as is done in other applications such as MassMatrix and MS2Links).

MS2DB+ uses a global optimization strategy based on the match-scores of each individual bonds to determine the optimal consistent disulfide topology of the molecule.

2. Does it provide support to multiple file types?

MS2DB+ supports most of the common XML-based MS/MS file formats, including mzXML, mzData, and mzML, besides Sequest DTA.

3. Is MS2DB+ an open source application?

Yes, MS2DB+ is an open source application. Its source code is available here

4. Why is MS2DB+ so effective?

MS2DB+ uses a two-stage trimming process: first, any theoretical S-S bonded combination which exceeds the precursor ion mass being matched is automatically discarded. Second, the proposed method trims the search space based on the subset sum approximation algorithm. The contribution of the second step becomes substantial when the fragmentation possibilities (in the confirmatory matching) rapidly grow due to:
  1. The consideration of multiple ion types in the fragmentation of the precursor ions.
  2. The possibility of a disulfide bonded structure being formed by more than two cysteine-containing peptides (more than one disulfide bond)
  3. The cases when the experimental fragmentation of precursor ions generated spectra with large amount of peaks (fragment ions) with significant intensity values.
On average, for the data used on this research, almost 98% of the search space (with up to 4.6 million theoretical fragments for the protein FT III) was trimmed while still obtaining the correct disulfide connectivity for the proteins studied.

5. How MS2DB+ determines a global disulfide topology?

A "local" (putative bond-level) view of the disulfide connectivity is formed once proteins data are analyzed by the two-level initial and confirmatory matching steps. The putative bonds however, may not form a globally consistent connectivity pattern. Consequently, the disulfide bonds determined after the two-step matching process need to be coalesced to obtain a globally consistent topology.

MS2DB+ models this problem as that of obtaining a maximum-weight matching in a graph G(V, E), where the cysteines constitute the set of vertices V and the putative disulfide bonds constitute the set of edges E. The match scores are used as weights for the edges.

6. Are MS2DB+ results affected by the specifics of the different mass spectrometers?

The design of MS2DB+ in no way assumes the use of any specific technology. As long as the input data is in one of the formats: Sequest DTA, mzXML, mzData, or mzML, the software is usable. The data analysis in MS2DB+ is designed in such a way so as to minimize the influence of variability in the data (which can be caused, among others due to the use of different spectrometers). This includes:
  1. "Matching windows" surrounding each experimental precursor and product ion being matched based on the charge state of the ions. The use of these windows avoids missing matches due to small differences between experimental and theoretical ion masses caused due to systemic noise which can vary between spectrometers. Although these windows are automatically calculated, users can manually tune them.
  2. Experimental MS/MS product ions are selected based on their relative intensity. Our framework does not assume a fixed threshold to filter the product ions having significant abundance. Instead, the filtering is based on the abundance relative to the maximum abundance found in the set of product ions being analyzed. Thus, the formation of the search space is independent from the resolution and noise ratio of the mass spectrometer being used.
  3. The fragmentation model in MS2DB+ allows for different ion types which can arise from spectrometers using different dissociation modes such as ETD, ECD, or EDD.

7. Does MS2DB+ support different fragmentation methods?

MS2DB+ supports many different fragmentation methods available, including: Collision-induced Dissiciation (CID), Electron-transfer dissociation (ETD), Electron-capture dissociation (ECD), and Electron-detachment dissociation (EDD). The variability of the ions formed in these methods was the main motivation to consider multiple ion types in the analysis of the data.

In MS2DB+, the user can independently select different ion types while determining disulfide topologies. This option allows users to plug in MS/MS data generated using any of the dissociation methods mentioned (CID, ECD, ETD, EDD). The figure below lists the different dissociation methods and the main ions generated by each method. As can be seen, all these ions are accounted for in the fragmentation model of MS2DB+.

Dissociation methods

8. Is the dataset large enough to validate the algorithm?

A better evaluation of the method would be possible with a larger dataset. We however note that for disulfide bond analysis, the use of mass spectra for ten proteins (as has been done by us) represents one of the largest data sets that have been analyzed in MS-based methods. For comparison purposes, we have summarized the size of the data sets used in some of the well known papers that have addressed this problem in the recent past in the figure below. As can be seen, the size of the data set in our validation studies is by far the largest (MS2Links used 5 molecules and comes next).

Dataset sizes

Unfortunately, there is no publicly available repository (along the lines of PDB) for MS/MS spectra that can be used for access to more data. We believe, creating such a standardized publicly available resource will be very helpful, among others, towards detailed testing and comparisons of solutions. We hope that the MS community would look into the creation of such a resource.

9. How well MS2DB+ performs when compared to other gold standard methods in the area?

MS2DB+ was compared, using the ten datasets available, with MassMatrix as well as with the gold standard predictive applications (DiANNA,ISULFIND and PreCys), which all use sequence information. To the best of our knowledge, such a comparative study was conducted for the first time in this area. The figure below summarizes the results – which show MS2DB+ to have outperformed all these systems.

Methods comparison

10. Does MS2DB+ provide access to intermediary results?

MS2DB+ provides the intermediary matches (IMs), all disulfide bonds found prior to the global optimization (CMs), and the confirmatory match scores in XML format at the bottom of the page in which the disulfide connectivity results are presented. The XML file labelled S-S connectivity lists the globally consistent disulfide bonds found and their respective match scores, pp scores and pp2 scores. The XML file labelled IM Details lists all Initial Matches found, including: the MS/MS file involved in the match, the precursor ion mass and charge state, the peptide sequences, and the cysteines present. Finally, the file labelled CM Details lists all disulfide bonds confirmed prior to the global connectivity consistency check, which is done using an implementation of the Gabow algorithm. These files allows users to carry a thorough analysis of the intermediary steps and results of MS2DB+.

As an example, we analyze part of the data presented in the XML files generated for the glycosyltransferase ST8SiaIV. Specifically, we review the intermediary results for the disulfide bond found between cysteines C142-C292. The global disulfide connectivity for this protein consists of two disulfide bonds: C142-C292 and C156-C356.

In the image below, all three Initial Matches determined (corresponding to S-S bond C142-C292) are shown.

IMs

In the next image, the match score, the pp-value, and the pp2-value determined for each Initial Match above are presented.

CMs