Tuesday, 25 November 2014

HUPO-PSI Meeting 2014: Rookie’s Notes

Standardisation: the most difficult flower to grow.
The PSI (Proteomics Standard Initiative) 2014 Meeting was held this year in Frankfurt (13-17 of April) and I can say I’m now part of this history. First, I will try to describe with a couple of sentences (for sure I will fai) the incredible venue, the Schloss Reinhartshausen Kempinski. When I saw for the first time the hotel, first thing came to my mind was those films from the 50s. Everything was elegant, classic, sophisticated - from the decoration to a small latch. The food was incredible and the service is first class from the moment you set foot on the front step and throughout the whole stay. 
Standardization is the process of developing and implementing technical standards. Standardization can help to maximize compatibility, interoperability, safety, repeatability, or quality. It can also facilitate commoditization of formerly custom processes. In bioinformatics, the standardization of file formats, vocabulary, and resources is a job that all of us appreciate but for several reasons nobody wants to do. First of all, standardization in bioinformatics means that you need to organize and merge different experimental and in-silico pipelines to have a common way to represent the information. In proteomics for example, you can use different sample preparation, combined with different fractionation techniques and different mass spectrometers; and finally using different search engines and post-processing tools. The diversity and possible combinations is needed because allow to explore different solutions for complex problems. (Standarization in Proteomics: From raw data to metadata files).

PRIDE and ProteomeXchange – Making proteomics data accessible and reusable

Slides Presentation:


Youtube Presentation:


Monday, 24 November 2014

QC metrics into Progenesis QI for proteomics

Originally posted NonLinear
Proteomics as a field is rapidly maturing; there is a real sense that established techniques are being improved, and excitement at emerging techniques for quantitation. Central to this revolution is the application of effective quality control (QC) – understanding what adversely affects proteomics data, monitoring for problems, and being able to pin down and address them when they arise.
We’ve been at the forefront of QC implementation over the years, from our early involvement in the Fixing Proteomics campaign to our staff (in a previous guise!) publishing on proteomics QC[1], and it’s an area that’s very important to us – we want you to have confidence in your data and your results, as well as our software.

Wednesday, 29 October 2014

What is BioHackathon 2014?

In a week BioHackathon 2014 will start (http://www.biohackathon.org/). It will be my first time ins this kind of "meeting". I will give a talk about PRIDE and ProteomeXchange and future developments of both platforms (below the complete list of talks).

But first, a quick introduction of BioHackathon. National Bioscience Database Center (NBDC) and Database Center for Life Science (DBCLS) have been organizing annual BioHackathon since 2008, mainly focusing on standardization (ontologies, controlled vocabularies, metadata) and interoperability of bioinformatics data and web services for improving integration (semantic web, web services, data integration), preservation and utilization of databases in life sciences. This year, we will focus on the standardization and utilization of human genome information with Semantic Web technologies in addition to our previous efforts on semantic interoperability and standardization of bioinformatics data and Web services.

Sunday, 26 October 2014

Ontologies versus controlled vocabularies.

While the minimum data standards describe the types of data elements to be captured, the use of standard vocabularies as values to populate the information about these data elements is also important to support interoperability. In many cases, groups develop term lists (controlled vocabularies) that describe what kinds of words and word phrases should be used to describe the values for a given data element. In the ideal case each term is accompanied by a textual definition that describes what the term means in order to support consistency in term use. However, many bioinformaticians have begun to develop and adopt ontologies that can serve in place of vocabularies for use as these allowed term lists. As with a specific vocabulary, an ontology is a domain-specific dictionary of terms and definitions. But an ontology also captures the semantic relationships between the terms, thus allowing logical inferencing about the entities represented by the ontology and by the data annotated using the ontology’s terms. 

The semantic relationships incorporated into the ontology represent universal relations between the classes represented by its terms based on knowledge about the entities described by the terms established previously. An ontology is a representation of universals; it described what is general in reality, not what is particular. Thus, ontologies describe classes of entities whereas databases tend to describe instances of entities. 

The Open Biomedical Ontology (OBO) library was established in 2001 as a repository of ontologies developed for use by the biomedical research community (http://sourceforge.net/projects/obo). In some cases, the ontology is composed of a highly focused set of terms to support the data annotation needs of a specific model organism community (e.g. the Plasmodium Life Cycle Ontology). In other cases, the ontology covers a broader set of terms that is intended to provide comprehensive coverage of an entire life science domain (e.g. the Cell Type Ontology). The European Bioinformatics Institute has also developed the Ontology Lookup Service (OLS) that provides a web service interface to query multiple OBO ontologies from a single location with a unified output format (http://www.ebi.ac.uk/ontology-lookup/). Both the BioPortal and the OLS permit users to browse individual ontologies and search for terms across ontologies according to term name and certain associated attributes. 

Thursday, 23 October 2014

Which journals release more public proteomics data!!!

I'm a big fan of data and the -omics family. Also, I like the idea of make more & more our data public available for others, not only for reuse, but also to guarantee the reproducibility and quality assessment of the results (Making proteomics data accessible and reusable: Current state of proteomics databases and repositories). I'm wondering which of these journals (list - http://scholar.google.co.uk/) encourages their submitters and authors to make their data publicly available:

Molecular & Cellular Proteomics
Journal of Proteome Research
Biochimica et Biophysica Acta (BBA)-Proteins and Proteomics
Journal of Proteomics
Proteomics - Clinical Applications
Proteome Science

After a simple statistic, based on PRIDE data:

Number of PRIDE projects by Journal