at the European Bioinformatics Institute is one of the most prominent data
repositories of mass spectrometry (MS)-based proteomics data. Here, we
summarize recent developments in the PRIDE database and related tools.
First, we provide up-to-date statistics in data content, splitting the figures by
groups of organisms and species, including peptide and protein
identifications, and post-translational modifications. We then describe the
tools that are part of the PRIDE submission pipeline, especially the recently
developed PRIDE Converter 2 (new submission tool) and PRIDE Inspector
(visualization and analysis tool). We also give an update about the integration
of PRIDE with other MS proteomics resources in the context of the
ProteomeXchange consortium. Finally, we briefly review the quality control
efforts that are ongoing at present and outline our future plans.
representing disease states. However, the causes of most disorders are
multifactorial, and systems-level approaches, including the analysis of
proteomes, are required for a more comprehensive understanding. The
proteome is extremely multifaceted owing to splicing and protein
modifications, and this is further amplified by the interconnectivity of proteins
into complexes and signalling networks that are highly divergent in time and
space. Proteome analysis heavily relies on mass spectrometry (MS).
MS-based proteomics is starting to mature and to deliver through a
combination of developments in instrumentation, sample preparation and
computational analysis. Here we describe this emerging next generation of
proteomics and highlight recent applications.
single measurement, a complete recording of the fragment ion spectra of all
the analytes in a biological sample for which the precursor ions are within a
predetermined m/z versus retention time window. To assess the performance
and suitability of SWATH-MS-based protein quantification for clinical use, we
compared SWATH-MS and SRM-MS-based quantification of N-linked
glycoproteins in human plasma, a commonly used sample for biomarker
discovery. Using dilution series of isotopically labeled heavy peptides
representing biomarker candidates, the LOQ of SWATH-MS was determined
to reach 0.0456 fmol at peptide level by targeted data analysis, which
corresponds to a concentration of 5–10 ng protein/mL in plasma, while SRM
reached a peptide LOQ of 0.0152 fmol. Moreover, the quantification of
endogenous glycoproteins using SWATH-MS showed a high degree of
reproducibility, with the mean CV of 14.90%, correlating well with SRM results
(R2 = 0.9784). Overall, SWATH-MS measurements showed a slightly lower
sensitivity and a comparable reproducibility to state-of-the-art SRM
measurements for targeted quantification of the N-glycosites in human
blood. However, a significantly larger number of peptides can be quantified
per analysis. We suggest that SWATH-MS analysis combined with
N-glycoproteome enrichment in plasma samples is a promising integrative
proteomic approach for biomarker discovery and verification.
side chains of proteins, is a fundamental regulator of protein activity,
stability, and molecular interactions. Most cellular processes, such as inter-
and intracellular signaling, protein synthesis, degradation, and apoptosis,
rely on phosphorylation. This PTM is thus involved in many diseases,
rendering localization and assessment of extent of phosphorylation of major
scientific interest. MS-based phosphoproteomics, which aims at describing
all phosphorylation sites in a specific type of cell, tissue, or organism, has
become the main technique for discovery and characterization of
phosphoproteins in a nonhypothesis driven fashion. In this review, we
describe methods for state-of-the-art MS-based analysis of protein
phosphorylation as well as the strategies employed in large-scale
phosphoproteomic experiments with focus on the various challenges and
limitations this field currently faces.
complete reference maps of the components of the system under study are
highly beneficial research tools. Examples of such maps include libraries of
the spectroscopic properties of molecules, or databases of drug structures in
analytical or forensic chemistry. Such maps, and methods to navigate them,
constitute reliable assays to probe any sample for the presence and amount
of molecules contained in the map. So far, attempts to generate such maps
for any proteome have failed to reach complete proteome coverage1, 2, 3.
Here we use a strategy based on high-throughput peptide synthesis and
mass spectrometry to generate an almost complete reference map (97% of
the genome-predicted proteins) of the Saccharomyces cerevisiae proteome.
We generated two versions of this mass-spectrometric map, one supporting
discovery-driven (shotgun)3, 4 and the other supporting hypothesis-driven
(targeted)5, 6 proteomic measurements. Together, the two versions of the
map constitute a complete set of proteomic assays to support most studies
performed with contemporary proteomic technologies. To show the utility of
the maps, we applied them to a protein quantitative trait locus (QTL)
analysis7, which requires precise measurement of the same set of peptides
over a large number of samples. Protein measurements over 78 S.
cerevisiae strains revealed a complex relationship between independent
genetic loci, influencing the levels of related proteins. Our results suggest
that selective pressure favours the acquisition of sets of polymorphisms that
adapt protein levels but also maintain the stoichiometry of functionally
related pathway members.
tremendously over the years. For model organisms like yeast, we can now
quantify complete proteomes in just a few hours. Developments discussed in
this Perspective will soon enable complete proteome analysis of mammalian
cells, as well, with profound impact on biology and biomedicine.
polypeptides is of fundamental importance. We report a peptidomic
strategy to detect short open reading frame (sORF)-encoded polypeptides
(SEPs) in human cells. We identify 90 SEPs, 86 of which are previously
uncharacterized, which is the largest number of human SEPs ever reported.
SEP abundances range from 10–1,000 molecules per cell, identical to
abundances of known proteins. SEPs arise from sORFs in noncoding RNAs as
well as multicistronic mRNAs, and many SEPs initiate with non-AUG start
codons, indicating that noncanonical translation may be more widespread in
mammals than previously thought. In addition, coding sORFs are present in
a small fraction (8 out of 1,866) of long intergenic noncoding RNAs.
Together, these results provide strong evidence that the human proteome is
more complex than previously appreciated.
systematically explored, representing vast, unchartered territories within
cellular signaling networks. Although a large number of in vivo
phosphorylated residues have been identified by mass spectrometry
(MS)‐based approaches, assigning the upstream kinases to these residues
requires biochemical analysis of kinase‐substrate relationships (KSRs). Here,
we developed a new strategy, called CEASAR, based on functional protein
microarrays and bioinformatics to experimentally identify substrates for 289
unique kinases, resulting in 3656 high‐quality KSRs. We then generated
consensus phosphorylation motifs for each of the kinases and integrated this
information, along with information about in vivo phosphorylation sites
determined by MS, to construct a high‐resolution map of phosphorylation
networks that connects 230 kinases to 2591 in vivo phosphorylation sites in
652 substrates. The value of this data set is demonstrated through the
discovery of a new role for PKA downstream of Btk (Bruton's tyrosine kinase)
during B‐cell receptor signaling. Overall, these studies provide global insights
into kinase‐mediated signaling pathways and promise to advance our
understanding of cellular signaling processes in humans.
approach for the identification of protein-protein interactions. However, for
any given protein of interest, determining which of the identified
polypeptides represent bona fide interactors versus those that are
background contaminants (for example, proteins that interact with the
solid-phase support, affinity reagent or epitope tag) is a challenging task.
The standard approach is to identify nonspecific interactions using one or
more negative-control purifications, but many small-scale AP-MS studies do
not capture a complete, accurate background protein set when available
controls are limited. Fortunately, negative controls are largely bait
independent. Hence, aggregating negative controls from multiple AP-MS
studies can increase coverage and improve the characterization of
background associated with a given experimental protocol. Here we present
the contaminant repository for affinity purification (the CRAPome) and
describe its use for scoring protein-protein interactions. The repository
(currently available for Homo sapiens and Saccharomyces cerevisiae)
and computational tools are freely accessible at http://www.crapome.org/.
evolved to interrogate the complexity, interconnectivity and dynamic nature
of proteomes. Currently, the most popular methods use either metabolic or
chemical isotope labeling with MS based quantification or chemical labeling
using isobaric tags with MS/MS based quantification. Here, we assess the
performance of three of the most popular approaches through systematic
independent large scale quantitative proteomics experiments, comparing
SILAC, dimethyl and TMT labeling strategies. Although all three methods
have their strengths and weaknesses, our data indicate that all three can
reach a similar depth in number of identified proteins using a classical (MS2
based) shotgun approach. TMT quantification using only MS2 is heavily
affected by co-isolation leading to compromised precision and accuracy.
This issue may be partly resolved by using an MS3 based acquisition;
however, at the cost of a significant reduction in number of proteins
quantified. Interestingly, SILAC and chemical labeling with MS based
quantification produce almost indistinguishable results, independent of
which database search algorithm used.
- In Vivo Protein Interaction Network Identified with a Novel Real-Time Cross-Linked Peptide Identification Strategy
Large-scale or proteome-wide measurements of protein interaction
topologies in cells currently pose an unmet challenge that could
dramatically improve understanding of complex biological systems. A
primary impediment includes direct protein topology and interaction
measurements from living systems since interactions that lack biological
significance may be introduced during cell lysis. Furthermore, many
biologically relevant protein interactions will likely not survive the
lysis/sample preparation and may only be measured with in vivo methods.
As a step toward meeting this challenge, a new mass spectrometry method
called Real-time Analysis for Cross-linked peptide Technology (ReACT) has
been developed that enables assignment of cross-linked peptides
“on-the-fly”. Using ReACT, 708 unique cross-linked (<5% FDR) peptide pairs
were identified from cross-linked E. coli cells. These data allow assembly of
the first protein interaction network that also contains topological features
of every interaction, as it existed in cells during cross-linker application. Of
the identified interprotein cross-linked peptide pairs, 40% are derived from
known interactions and provide new topological data that can help visualize
how these interactions exist in cells. Other identified cross-linked peptide
pairs are from proteins known to be involved within the same complex, but
yield newly discovered direct physical interactors. ReACT enables the first
view of these interactions inside cells, and the results acquired with this
method suggest cross-linking can play a major role in future efforts to map
the interactome in cells.
- Metabolomics coupled with proteomics advancing drug discovery towards more agile development of targeted combination therapies.
Chinese medicine (TCM), practitioners often prescribe a combination of
plant species and/or minerals called formulae. Unfortunately, the working
mechanisms of most of these compounds are difficult to determine and
thus remain unknown. In an attempt to address the benefits of formulae
based on current biomedical approaches, we analyzed the components of
Yinchenhao Tang (YCHT), a classical formula and has been shown to be
clinically effective for treating hepatic injury (HI) syndrome. The three
principal components of YCHT are Artemisia annua L., Gardenia jasminoids
Ellis, and Rheum Palmatum L., whose major active ingredients are 6,7 -
dimethylesculetin (D), geniposide (G) and rhein (R), respectively. To
determine the mechanisms that underlie this formula, we conducted a
systematic analysis of the therapeutic effects of the DGR compound using
immunohistochemistry, biochemistry, metabolomics and proteomics. Here,
we report that the DGR combination exerts a more robust therapeutic effect
than any one or two of the three individual compounds by hitting multiple
targets in a rat model of HI. Thus, DGR synergistically causes intensified
dynamic changes in metabolic biomarkers, regulates molecular networks
through target proteins, has a synergistic/additive effect and activates both
intrinsic and extrinsic pathways.