Saturday, 4 October 2014

Analysis of histone modifications with PEAKS 7: A respond to Search Engines comparison from PEAKs Team

Recently we posted a comparison of different search engines for PTMs studies (Evaluation of Proteomic Search Engines for PTMs Identification). After some discussion of the mentioned results in our post the  PEAKS Team just published a blog post with the reanalysis of the dataset. Here the results:

Originally Posted in Peaks Blog:
The complex nature of histone modification patterns has posed as a challenge for bioinformatics analysis over the years. Yuan et al. [1] conducted a study using two datasets from human HeLa histone samples, to benchmark the performance of current proteomic search engines. This article was published in J Proteome Res. 2014 Aug 28 (PubMed), and the data from the two datasets, HCD_Histone and CID_Histone (PXD001118), was made publically available through ProteomeXchange. With this data, the article uses eight different proteomic search engines to compare and evaluate the performance and capability of each. The evaluated search engines in this study are: pFind, Mascot, SEQUEST, ProteinPilot, PEAKS 6, OMSSA, TPP and MaxQuant. 
In this study, PEAKS 6 was used to compare the performance capabilities between search engines. However, PEAKS 7, which was released November 2013, is the latest version available of the PEAKS Studio software. PEAKS 7 not only includes better performance than PEAKS 6, but a lot of additional and improved features. Our team has reanalyzed the two datasets HCD_Histone and CID_Histone with PEAKS 7 to update the ID results presented in the publication by Yuan et al.  These updated results showed that instead, it is PEAKS, pFind and Mascot that identify the most confident results.

Proportion of Confident IDs
As indicated in the article, the two HeLa histone datasets were examined by each search engine using the same database search parameters. Seven variable modifications of histone were used in the study, and are reiterated in table 1 below.
 
Table 1. Modification parameters for database search
Fixed modification
Propionyl[Peptide N-term]/+56.02
Variable modification
First (un)
Propionyl[K]/+56.026
Second (ac)
Propionyl[K]/+56.026; Acetyl[K]/+42.011
Third (me)
Propionyl[K]/+56.026; Methyl_Propionyl[K]/+70.042
Fourth (di)
Propionyl[K]/+56.026; Dimethyl[K]/+28.031
Fifth (tr)
Propionyl[K]/+56.026; Trimethyl[K]/+42.047
Sixth (ph)
Propionyl[K]/+56.026; Phospho[ST]/+79.966
Seventh (co)
Propionyl[K]/+56.026; Acetyl[K]/+42.011; Methyl_Propionyl[K]/+70.042; Dimethyl[K]/+28.031;
Trimethyl[K]/+42.047; Phospho[ST]/+79.966
 
When the data was run with PEAKS 7 also using these same parameters, an updated comparison of the IDs and confident IDs from the article published by Yuan et al. was created, as shown in figure 1. The comparison includes the results produced by the eight different search engines. IDs (shown as solid bars) from each search engine are identifications with an FDR < 1%; whereas confident IDs (shown as striped bars) are the number of IDs from each search engine which are also present in the ‘all_Confident’ group of IDs. The term ‘all_Confident’ was used to indicate IDs that were found by at least two of the eight different search engines.


 
Figure 1 (a-g). Comparison of the number of IDs and confident IDs of the seven variable modifications produced by the different search engines using HeLa histone HCD and CID data  
(a) indicates the number of first (un) modified ID; (b) number of second (ac) modified ID; (c) number of third (me) modified ID; (d) number of fourth (di) modified ID; (e) number of fifth (tr) modified ID; (f) number of sixth (ph) modified ID; and (g) number of seventh (co) modified ID.


By analyzing each of the graphs presented in figure 1, PEAKS 7 produces the most confident results of the search engines evaluated in the study, along with pFind and Mascot. This is true in all cases (un, ac, di, tr, ph, and co; where ph tied with pFind and Mascot, and cotied for first with Mascot) except in the third modification where pFind and Mascot found the most confident result. 
Running Time
 
For this analysis, PEAKS 7 was run on a typical desktop computer with an i7 CPU and 16G RAM.  PEAKS 7 finished each of the first six searches (un, ac, me, di, tr, and ph) around 22 minutes and then 14 minutes for the HCD_Histone and CID_Histone database searches respectively.  Compared to 2h-7h indicated by [1] using PEAKS 6, the speed of PEAKS 7 is much faster. For the seventh search which involved multiple PTMs (co), PEAKS spent 30 minutes, and then 14 minutes performing the database search for HCD_Histone and CID_Histone respectively. 

Therefore, the performance time of PEAKS 7 is very comparable to the other search engines as drawn in conclusion from [1] and consistent with the performance capabilities presented in (http://peaksblog.bioinfor.com/2013/12/boost-your-analysis-speed-with-peaks-7.html).

References

1.     Yuan ZF, Lin S, Molden RC, Garcia BA. Evaluation of proteomic search engines for the analysis of histone modifications.  J Proteome Res. 2014 Aug 28. [Epub ahead of print]