Monday, 20 January 2014

Some of the most cited manuscripts in Proteomics and Computational Proteomics (2013)

Some of the most cited manuscripts in 2013 in the field of Proteomics and Computational Proteomics (no order):

     The PRoteomics IDEntifications (PRIDE, database 
     at the European Bioinformatics Institute is one of the most prominent data 
     repositories of mass spectrometry (MS)-based proteomics data. Here, we 
     summarize recent developments in the PRIDE database and related tools. 
     First, we provide up-to-date statistics in data content, splitting the figures by 
     groups of organisms and species, including peptide and protein 
     identifications, and post-translational modifications. We then describe the 
     tools that are part of the PRIDE submission pipeline, especially the recently 
     developed PRIDE Converter 2 (new submission tool) and PRIDE Inspector 
     (visualization and analysis tool). We also give an update about the integration 
     of PRIDE with other MS proteomics resources in the context of the 
     ProteomeXchange consortium. Finally, we briefly review the quality control 
     efforts that are ongoing at present and outline our future plans.

     Next-generation sequencing allows the analysis of genomes, including those
     representing disease states. However, the causes of most disorders are 
     multifactorial, and systems-level approaches, including the analysis of 
     proteomes, are required for a more comprehensive understanding. The 
     proteome is extremely multifaceted owing to splicing and protein 
     modifications, and this is further amplified by the interconnectivity of proteins 
     into complexes and signalling networks that are highly divergent in time and 
     space. Proteome analysis heavily relies on mass spectrometry (MS). 
     MS-based proteomics is starting to mature and to deliver through a 
     combination of developments in instrumentation, sample preparation and 
     computational analysis. Here we describe this emerging next generation of 
     proteomics and highlight recent applications.
     SWATH-MS is a data-independent acquisition method that generates, in a 
     single measurement, a complete recording of the fragment ion spectra of all 
     the analytes in a biological sample for which the precursor ions are within a 
     predetermined m/z versus retention time window. To assess the performance 
     and suitability of SWATH-MS-based protein quantification for clinical use, we 
     compared SWATH-MS and SRM-MS-based quantification of N-linked 
     glycoproteins in human plasma, a commonly used sample for biomarker 
     discovery. Using dilution series of isotopically labeled heavy peptides 
     representing biomarker candidates, the LOQ of SWATH-MS was determined 
     to reach 0.0456 fmol at peptide level by targeted data analysis, which 
     corresponds to a concentration of 5–10 ng protein/mL in plasma, while SRM 
     reached a peptide LOQ of 0.0152 fmol. Moreover, the quantification of 
     endogenous glycoproteins using SWATH-MS showed a high degree of 
     reproducibility, with the mean CV of 14.90%, correlating well with SRM results 
     (R2 = 0.9784). Overall, SWATH-MS measurements showed a slightly lower 
     sensitivity and a comparable reproducibility to state-of-the-art SRM 
     measurements for targeted quantification of the N-glycosites in human 
     blood. However, a significantly larger number of peptides can be quantified 
     per analysis. We suggest that SWATH-MS analysis combined with 
     N-glycoproteome enrichment in plasma samples is a promising integrative 
     proteomic approach for biomarker discovery and verification.
       Phosphorylation, the reversible addition of a phosphate group to amino acid 
       side chains of proteins, is a fundamental regulator of protein activity, 
       stability, and molecular interactions. Most cellular processes, such as inter- 
       and intracellular signaling, protein synthesis, degradation, and apoptosis, 
       rely on phosphorylation. This PTM is thus involved in many diseases, 
       rendering localization and assessment of extent of phosphorylation of major 
       scientific interest. MS-based phosphoproteomics, which aims at describing 
       all phosphorylation sites in a specific type of cell, tissue, or organism, has 
       become the main technique for discovery and characterization of 
       phosphoproteins in a nonhypothesis driven fashion. In this review, we 
       describe methods for state-of-the-art MS-based analysis of protein 
       phosphorylation as well as the strategies employed in large-scale 
       phosphoproteomic experiments with focus on the various challenges and 
       limitations this field currently faces.

      Experience from different fields of life sciences suggests that accessible, 
      complete reference maps of the components of the system under study are 
      highly beneficial research tools. Examples of such maps include libraries of 
      the spectroscopic properties of molecules, or databases of drug structures in 
      analytical or forensic chemistry. Such maps, and methods to navigate them, 
      constitute reliable assays to probe any sample for the presence and amount 
      of molecules contained in the map. So far, attempts to generate such maps 
      for any proteome have failed to reach complete proteome coverage1, 2, 3. 
      Here we use a strategy based on high-throughput peptide synthesis and 
      mass spectrometry to generate an almost complete reference map (97% of 
      the genome-predicted proteins) of the Saccharomyces cerevisiae proteome. 
      We generated two versions of this mass-spectrometric map, one supporting 
      discovery-driven (shotgun)3, 4 and the other supporting hypothesis-driven 
      (targeted)5, 6 proteomic measurements. Together, the two versions of the 
      map constitute a complete set of proteomic assays to support most studies 
      performed with contemporary proteomic technologies. To show the utility of 
      the maps, we applied them to a protein quantitative trait locus (QTL) 
      analysis7, which requires precise measurement of the same set of peptides 
      over a large number of samples. Protein measurements over 78 S. 
      cerevisiae strains revealed a complex relationship between independent 
      genetic loci, influencing the levels of related proteins. Our results suggest 
      that selective pressure favours the acquisition of sets of polymorphisms that 
      adapt protein levels but also maintain the stoichiometry of functionally 
      related pathway members. 

     High-resolution mass spectrometry (MS)-based proteomics has progressed 
     tremendously over the years. For model organisms like yeast, we can now 
     quantify complete proteomes in just a few hours. Developments discussed in 
     this Perspective will soon enable complete proteome analysis of mammalian 
     cells, as well, with profound impact on biology and biomedicine.

       The complete extent to which the human genome is translated into 
       polypeptides is of fundamental importance. We report a peptidomic 
       strategy to detect short open reading frame (sORF)-encoded polypeptides 
       (SEPs) in human cells. We identify 90 SEPs, 86 of which are previously 
       uncharacterized, which is the largest number of human SEPs ever reported. 
       SEP abundances range from 10–1,000 molecules per cell, identical to 
       abundances of known proteins. SEPs arise from sORFs in noncoding RNAs as 
       well as multicistronic mRNAs, and many SEPs initiate with non-AUG start 
       codons, indicating that noncanonical translation may be more widespread in 
       mammals than previously thought. In addition, coding sORFs are present in 
       a small fraction (8 out of 1,866) of long intergenic noncoding RNAs.   
       Together, these results provide strong evidence that the human proteome is 
       more complex than previously appreciated.
      The landscape of human phosphorylation networks has not been 
      systematically explored, representing vast, unchartered territories within 
      cellular signaling networks. Although a large number of in vivo 
      phosphorylated residues have been identified by mass spectrometry 
     (MS)‐based approaches, assigning the upstream kinases to these residues 
      requires biochemical analysis of kinase‐substrate relationships (KSRs). Here, 
      we developed a new strategy, called CEASAR, based on functional protein 
      microarrays and bioinformatics to experimentally identify substrates for 289 
      unique kinases, resulting in 3656 high‐quality KSRs. We then generated 
      consensus phosphorylation motifs for each of the kinases and integrated this 
      information, along with information about in vivo phosphorylation sites 
      determined by MS, to construct a high‐resolution map of phosphorylation 
      networks that connects 230 kinases to 2591 in vivo phosphorylation sites in 
      652 substrates. The value of this data set is demonstrated through the 
      discovery of a new role for PKA downstream of Btk (Bruton's tyrosine kinase) 
      during B‐cell receptor signaling. Overall, these studies provide global insights 
      into kinase‐mediated signaling pathways and promise to advance our 
      understanding of cellular signaling processes in humans.
      Affinity purification coupled with mass spectrometry (AP-MS) is a widely used 
      approach for the identification of protein-protein interactions. However, for 
      any given protein of interest, determining which of the identified 
      polypeptides represent bona fide interactors versus those that are 
      background contaminants (for example, proteins that interact with the 
      solid-phase support, affinity reagent or epitope tag) is a challenging task. 
      The standard approach is to identify nonspecific interactions using one or 
      more negative-control purifications, but many small-scale AP-MS studies do 
      not capture a complete, accurate background protein set when available 
      controls are limited. Fortunately, negative controls are largely bait 
      independent. Hence, aggregating negative controls from multiple AP-MS 
      studies can increase coverage and improve the characterization of 
      background associated with a given experimental protocol. Here we present 
      the contaminant repository for affinity purification (the CRAPome) and 
      describe its use for scoring protein-protein interactions. The repository 
      (currently available for Homo sapiens and Saccharomyces cerevisiae)
      and computational tools are freely accessible at  
       Several quantitative mass spectrometry based technologies have recently 
       evolved to interrogate the complexity, interconnectivity and dynamic nature 
       of proteomes. Currently, the most popular methods use either metabolic or 
       chemical isotope labeling with MS based quantification or chemical labeling 
       using isobaric tags with MS/MS based quantification. Here, we assess the 
       performance of three of the most popular approaches through systematic 
       independent large scale quantitative proteomics experiments, comparing 
       SILAC, dimethyl and TMT labeling strategies. Although all three methods 
       have their strengths and weaknesses, our data indicate that all three can 
       reach a similar depth in number of identified proteins using a classical (MS2 
       based) shotgun approach. TMT quantification using only MS2 is heavily 
       affected by co-isolation leading to compromised precision and accuracy. 
       This issue may be partly resolved by using an MS3 based acquisition; 
       however, at the cost of a significant reduction in number of proteins 
       quantified. Interestingly, SILAC and chemical labeling with MS based 
       quantification produce almost indistinguishable results, independent of   
       which database search algorithm used.
       Protein interaction topologies are critical determinants of biological function. 
       Large-scale or proteome-wide measurements of protein interaction 
       topologies in cells currently pose an unmet challenge that could
       dramatically improve understanding of complex biological systems. A 
       primary impediment includes direct protein topology and interaction 
       measurements from living systems since interactions that lack biological 
       significance may be introduced during cell lysis. Furthermore, many 
       biologically relevant protein interactions will likely not survive the 
       lysis/sample preparation and may only be measured with in vivo methods. 
       As a step toward meeting this challenge, a new mass spectrometry method 
       called Real-time Analysis for Cross-linked peptide Technology (ReACT) has 
       been developed that enables assignment of cross-linked peptides 
      “on-the-fly”. Using ReACT, 708 unique cross-linked (<5% FDR) peptide pairs 
       were identified from cross-linked E. coli cells. These data allow assembly of 
       the first protein interaction network that also contains topological features 
       of every interaction, as it existed in cells during cross-linker application. Of 
       the identified interprotein cross-linked peptide pairs, 40% are derived from 
       known interactions and provide new topological data that can help visualize 
       how these interactions exist in cells. Other identified cross-linked peptide 
       pairs are from proteins known to be involved within the same complex, but 
       yield newly discovered direct physical interactors. ReACT enables the first 
       view of these interactions inside cells, and the results acquired with this 
       method suggest cross-linking can play a major role in future efforts to map 
       the interactome in cells.
       To enhance therapeutic efficacy and reduce adverse effects of traditional 
       Chinese medicine (TCM), practitioners often prescribe a combination of 
       plant species and/or minerals called formulae. Unfortunately, the working 
       mechanisms of most of these compounds are difficult to determine and 
       thus remain unknown. In an attempt to address the benefits of formulae 
       based on current biomedical approaches, we analyzed the components of
       Yinchenhao Tang (YCHT), a classical formula and has been shown to be
       clinically effective for treating hepatic injury (HI) syndrome. The three
       principal components of YCHT are Artemisia annua L., Gardenia jasminoids
       Ellis, and Rheum Palmatum L., whose major active ingredients are 6,7 -
       dimethylesculetin (D), geniposide (G) and rhein (R), respectively. To 
       determine the mechanisms that underlie this formula, we conducted a
       systematic analysis of the therapeutic effects of the DGR compound using 
       immunohistochemistry, biochemistry, metabolomics and proteomics. Here, 
       we report that the DGR combination exerts a more robust therapeutic effect
      than any one or two of the three individual compounds by hitting multiple
      targets in a rat model of HI. Thus, DGR synergistically causes intensified
     dynamic changes in metabolic biomarkers, regulates molecular networks
     through target proteins, has a synergistic/additive effect and activates both
     intrinsic and extrinsic pathways.