Monday, 21 May 2012

An "in-house" Tool

One of the small hidden details in publications, even in those with a higher impact, is the use of "in-house programs". What is an "in-house" program or tool: Normally is a piece of software that researchers use to analyze process or visualize the experimental data, but most important the software it-self is not published

The term by itself is inoffensive, but the concept could be extremely dangerous. We can cite hundreds of manuscripts that included in the data analysis "in-house" tools, but never the terms "in-house instruments". The authors always needs to cite the manufacturer, the reagents, even the year and the company. I know, we have a section to describe data processing but mostly we cite some parameters, and the well known software like search engines (Mascot, X!Tandem, Sequest, etc). But at some point of this section several times you can find the term "in-house" tool. It could be a reference to an excel formula or to a complete and complex java program with many tasks like parsing a search engine output, computing the FDR, removing false-positive identifications, computing peptide-spectrum-match redundancy, etc. The are not a real/objective measure to distinguish between a little-simple tool and a complex tool one.   

What does it mean:
  • It is difficult to follow the results when the researchers used complex in-house programs. 
  • Impossible to evaluate the results if you don't know the methods and algorithms inside the in-house tools. 
  • The most important thing is: Results are not reproducible, not comparable!!!

Some journals force authors to attach the code, and the programs to be used by the readers of the article. But it is still a problem in the community and is growing because more methods and algorithms are public available and more non-bioinformatician researchers have programming skills. 

Some side-hidden disadvantages are:
  • Some tools can handle with most of these analysis, but they are not used at all. Even when these tools are published on important-high impact journals.
  • "Small" but very important problems in the community do not have standardized and well tested tools to solve them.
  • Software and bioinformatics solutions are underestimated. 
When a tool is designed/tested and stressed during the publication process, all of the errors, incompatibilities, statistical details, are fixed in order to report the results. In the process several datasets can be used to compare the results obtained using different settings, etc. This is the nature of a tool/algorithm publication.

The reviewers and editors should force authors to justify the use of "in-house" programs. Also, if an in-house program is needed the code of the programs must be attached and also a user manual, as well as a) short document explaining the algorithms used by the tool.

Several journals can be used to publish bioinformatics tools as a research or technical note (not only big tools): 
What do you think?