Saturday, 14 May 2016

Sunday, 13 March 2016

Genome Mapping and SNP Calling with BioDocker

set -xeu




if [[ "$RUNINDOCKER" -eq "1" ]]; then
DRUN="docker run --rm -v $PWD:/data --workdir /data -i"


docker pull $SAMTOOLS_IMAGE
docker pull $BWA_IMAGE
docker pull $TABIX_IMAGE
docker pull $BCFTOOLS_IMAGE



if [[ ! -f "$FQ1" ]]; then
curl| gzip -d | head -$HEADLEN > $FQ1.tmp && mv $FQ1.tmp $FQ1

if [[ ! -f "$FQ2" ]]; then
curl| gzip -d | head -$HEADLEN > $FQ2.tmp && mv $FQ2.tmp $FQ2

if [[ ! -f "$REF" ]]; then
curl | gunzip -c > $REF.tmp && mv $REF.tmp $REF

if [[ ! -f "$REF.fai" ]]; then

if [[ ! -f "$REF.bwt" ]]; then
$BWA index $REF

if [[ ! -f "$BNM.sam" ]]; then
$BWA mem -R '@RG\tID:foo\tSM:bar\tLB:library1' $REF $FQ1 $FQ2 > $BNM.sam.tmp && mv $BNM.sam.tmp $BNM.sam

if [[ ! -f "$BNM.bam" ]]; then
#$SAMTOOLS sort -O bam -T /tmp -l 0 --input-fmt-option SAM -o $BNM.tmp.bam $BNM.sam && mv $BNM.tmp.bam $BNM.bam
$SAMTOOLS sort -O bam -T /tmp -l 0 -o $BNM.tmp.bam $BNM.sam && mv $BNM.tmp.bam $BNM.bam

if [[ ! -f "$BNM.cram" ]]; then
$SAMTOOLS view -T $REF -C -o $BNM.tmp.cram $BNM.bam && mv $BNM.tmp.cram $BNM.cram

if [[ ! -f "$BNM.P.cram" ]]; then
$BWA mem $REF $FQ1 $FQ2 | \
$SAMTOOLS sort -O bam -l 0 -T /tmp - | \
$SAMTOOLS view -T $REF -C -o $BNM.P.tmp.cram - && mv $BNM.P.tmp.cram $BNM.P.cram

#if [[ ! -f "" ]]; then
#$SAMTOOLS view $BNM.cram

#if [[ ! -f "" ]]; then
#$SAMTOOLS mpileup -f $REF $BNM.cram

if [[ ! -f "$BNM.vcf.gz" ]]; then
$SAMTOOLS mpileup -ugf $REF $BNM.bam | $BCFTOOLS call -vmO z -o $BNM.vcf.gz.tmp && mv $BNM.vcf.gz.tmp $BNM.vcf.gz

if [[ ! -f "$BNM.vcf.gz.tbi" ]]; then
$TABIX -p vcf $BNM.vcf.gz

if [[ ! -f "$BNM.vcf.gz.stats" ]]; then
$BCFTOOLS stats -F $REF -s - $BNM.vcf.gz > $BNM.vcf.gz.stats.tmp && mv $BNM.vcf.gz.stats.tmp $BNM.vcf.gz.stats

mkdir plots &>/dev/null || true

#if [[ ! -f "plots/tstv_by_sample.0.png" ]]; then
#$PLOTVCFSTATS -p plots/ $BNM.vcf.gz.stats

if [[ ! -f "$BNM.vcf.filtered.gz" ]]; then
$BCFTOOLS filter -O z -o $BNM.vcf.filtered.gz -s LOWQUAL -i'%QUAL>10' $BNM.vcf.gz

Saturday, 28 November 2015

Protein identification with Comet, PeptideProphet and ProteinProphet using BioDocker containers

Proteomics data analysis is dominated by database-based search engines strategies. Perhaps the most common protocol today is to retrieve raw data from a mass spectrometry, convert the raw data from binary format to a text-based format and then process it using a database search algorithm. The resulting data need to be statistically filtered in order to converge to a final list of identified peptides and proteins.

Amount Search Engines, Comet (the youngest son of SEQUEST) is one of the most popular nowadays. Today we are going to show how to run a simple analysis protocol using the Comet database search engine followed by statistical analysis using PeptideProphet and ProteinProphet, two of the most known and robust processing algorithms for proteomics data.

This pipeline is available in TPP, however several users prefer to use the individual components rather than Trans-proteomics Pipeline.  The big differential here is how we are going to do it. Instead of going through the step-by-step in how to install and configure Comet and TPP, we are going to run the pipeline using Docker containers from the BioDocker project (you can get more information on the project here).

Wednesday, 21 October 2015

Installing MESOS in your Mac

1- Homebrew is an open source package management system for the Mac that simplifies installation of packages from source.

ruby -e "$(curl -fsSL"

2- Once you have Homebrew installed, you can install Mesos on your laptop with these two commands:

brew update
brew install mesos

You will need to wait while the most current, stable version of Mesos is downloaded, compiled, and installed on your machine. Homebrew will let you know when it’s finished by displaying a beer emoji in your terminal and a message like the following:

/usr/local/Cellar/mesos/0.19.0: 83 files, 24M, built in 17.4 minutes
Start Your Mesos Cluster

3- Running Mesos in your machine: Now that you have Mesos installed on your laptop, it’s easy to start your Mesos cluster. To see Mesos in action, spin up an in-memory master with the following command:

/usr/local/sbin/mesos-master --registry=in_memory --ip=

A Mesos cluster needs at least one Mesos Master to coordinate and dispatch tasks onto Mesos Slaves. When experimenting on your laptop, a single master is all you need. Once your Mesos Master has started, you can visit its management console: http://localhost:5050

Since a Mesos Master needs slaves onto which it will dispatch jobs, you might also want to run some of those. Mesos Slaves can be started by running the following command for each slave you wish to launch:

sudo /usr/local/sbin/mesos-slave --master=

Tuesday, 6 October 2015

The end of the big files nightmare in Github

One of the nightmares of Github was always the Big Files.  The previous limit in 100MB made difficult to test applications with real examples demanding a lot of work to create "dummy" test files. Today the official announcement of GitHub:

Git LFS is an open source Git extension that we released in April for integrating large binary files into your Git workflow. Distributed version control systems like Git have enabled new and powerful workflows, but they haven’t always been practical for versioning large files.
Git LFS solves this problem by replacing large files with text pointers inside Git, while storing the file contents on a remote server like 
New git lfs fetch and git lfs pull commands that download objects much faster than the standard Git smudge filter
Options for customizing what files are automatically downloaded on checkout
Selectively ignore a directory of large files that you don’t need for daily work
Download recent files from other branches
Improvements to git lfs push that filter the number of commits to scan for eligible LFS objects to upload. This greatly reduces the time to push new feature branches
A Windows installer and Linux packages for more convenient installation
An experimental extension system for teams that want to customize how objects are stored on the server
Git LFS is now available to all users on, just install the client to get started.
I just added my first big file with this steps:

1 - Download the git plugin from here or using Homebrew
   brew install git-lfs

2- Select the file types you'd like Git LFS to manage (or directly edit your .gitattributes). You can configure additional file extensions at anytime.

git lfs track "*.psd"
3 -There is no step three. Just commit and push to GitHub as you normally would.

git add file.psd
git commit -m "Add design file"
git push origin master
Done !!!!