Oral conference presentations

Biocuration, Databases, Ontologies, and Text Mining

Felipe Albrecht*, Markus List, Christoph Bock and Thomas Lengauer (*Max Planck Institute for Informatics and Graduate School of Computer Science, Saarbrücken, Germany) [ OA paper ]
Analysing large-scale epigenomic data with DeepBlue

Large amounts of epigenomic data are generated under the umbrella of the International Human Epigenome Consortium, which aims to establish 1000 reference epigenomes within the next few years. These data have the potential to unravel the complexity of epigenomic regulation. However, their effective use is hindered by the lack of flexible and easy-to-use methods for data retrieval. Extracting region sets of interest is a cumbersome task that involves several manual steps: identifying the relevant experiments, downloading the corresponding data files and filtering the region sets of interest. Here we present the DeepBlue Epigenomic Data Server, which streamlines epigenomic data analysis as well as software development. DeepBlue provides a comprehensive programmatic interface for finding, selecting, filtering, summarizing and downloading region sets. It contains data from four major epigenome projects, namely ENCODE, ROADMAP, BLUEPRINT and DEEP. DeepBlue comes with a user manual, examples and a well-documented application programming interface (API). The latter is accessed via the XML-RPC protocol supported by many programming languages. To demonstrate usage of the API and to enable convenient data retrieval for non-programmers, we offer an optional web interface. DeepBlue can be openly accessed at http://deepblue.mpi-inf.mpg.de.

Davide Alocci*, Alessandra Gastaldello, Julien Mariethoz and Frédérique Lisacek (*Swiss Institute for Bioinformatics, Geneva, Switzerland)
Glycomics goes visual and interactive

Michael Baudis*, Bo Gao, Melanie Courtot, Paula Carrio Cordo and Helen Parkinson (*University of Zurich & Swiss Institute of Bioinformatics, Zürich, Switzerland)
Advancing the Global Alliance for Genomics and Health data schemas through data-driven implementations

Lars Juhl Jensen (University of Copenhagen, Denmark) [ Preprint ]
One tagger, many uses: Illustrating the power of dictionary-based named entity recognition

Automatic annotation of text is an important complement to manual annotation, because the latter is highly labour intensive. We have developed a fast dictionary-based named entity recognition (NER) system and addressed a wide variety of biomedical problems by applied it to text from many different sources. We have used this tagger both in real-time tools to support curation efforts and in pipelines for populating databases through bulk processing of entire Medline, the open-access subset of PubMed Central, NIH grant abstracts, FDA drug labels, electronic health records, and the Encyclopedia of Life. Despite the simplicity of the approach, it typically achieves 80-90% precision and 70-80% recall. Many of the underlying dictionaries were built from open biomedical ontologies, which further facilitate integration of the text-mining results with evidence from other sources.

Robin Liechti*, Nancy George, Lou Götz, Sara El-Gebali, Isaac Crespo, Anastasia Chasapi, Ioannis Xenarios and Thomas Lemberger (*Vital-IT, Swiss Institute of Bioinformatics, Lausanne, Switzerland)
SourceData: a semantic platform to make data and figures discoverable

Daniel Teixeira*, Valentine Rech de Laval, Zahn-Zabal Monique, Pierre-André Michel, Lydie Lane, Amos Bairoch and Pascale Gaudet (*Swiss Institute of Bioinformatics, Geneva, Switzerland)
neXtProt data model 2.0: Modelling complex annotations

Douglas Teodoro*, Luc Mottin, Emilie Pasche, Julien Gobeill, Arnaud Gaudinat and Patrick Ruch (Swiss Institute of Bioinformatics, HES-SO/HEG Geneva, Switzerland)
A pipeline to improve ranking in user searches for biomedical research datasets

Suzanna E. Lewis, Marc Feuermann, Kimberley Van Auken, David Hill, Seth Carbon, Paul Thomas* and Christopher Mungall (*University of Southern California, CA, USA)
The Noctua Modeling Tool

To Top

Computational Biology Driving Experimental Design

Yael Korem*, Anat Bren, Ghil Jona and Uri Alon (*Weizmann Institute of Science, Israel)
A bacterial growth law for nutritional upshift

Andrea Riba*, Nitish Mittal, Noemi Di Nanni, Alexander Schmidt and Mihaela Zavolan (*Biozentrum University of Basel & Swiss Institute of Bioinformatics, Basel, Switzerland)
Exploring the predictability of protein synthesis rates in the yeast Saccharomyces cerevisiae

Jörg C. Heinrich, Sebastian Salentin*, Melissa F. Adasme, Yixin Zhang and Michael Schroeder (*BIOTEC, Technische Universität Dresden, Germany)
Targeting cancer with structural bioinformatics: New HSP27 inhibitors efficiently suppress drug resistance development

Drug resistance is an important open problem in cancer treatment. In recent years, the heat shock protein Hsp27 was identified as a key player driving resistance development. Hsp27 is overexpressed in many cancer types and influences cellular processes such as apoptosis, DNA repair, recombination, and formation of metastases. As a result, cancer cells are able to suppress apoptosis and develop resistance to cytostatic drugs. To identify HSP27 inhibitors we followed a novel structure-based drug repositioning approach. We characterised binding of a known inhibitor with interactions patterns of our tool PLIP and exploited this knowledge to assess better binders. Using our approach, we identified a FDA-approved malaria drug as a promising repositioning candidate and validated experimentally that it suppresses chemoresistance by inhibiting Hsp27.

Benjamin Towbin*, Yael Korem, Anat Bren, Shany Doron, Rotem Sorek and Uri Alon (*Friedrich Miescher Institute, Basel, Switzerland & Weizmann Institute of Science, Israel)
Rules of thumb in gene control: optimality and sub-optimality in bacterial gene expression

To Top

Computational Neuroscience

Matthew Chalk*, Olivier Marre and Gašper Tkačik (*Institut de la Vision, Université de Pierre et Marie Curie Paris, France) [ Preprint ]
Towards a unified theory of efficient, predictive and sparse coding

A central goal in theoretical neuroscience is to predict the response properties of sensory neurons from first principles. Several theories have been proposed to this end. "Efficient coding'' posits that neural circuits maximise information encoded about their inputs. "Sparse coding'' posits that individual neurons respond selectively to specific, rarely occurring, features. Finally, "predictive coding'' posits that neurons preferentially encode stimuli that are useful for making predictions. Except in special cases, it is unclear how these theories relate to each other, or what is expected if different coding objectives are combined. To address this question, we developed a unified framework that encompasses these previous theories and extends to new regimes, such as sparse predictive coding. We explore cases when different coding objectives exert conflicting or synergistic effects on neural response properties. We show that predictive coding can lead neurons to either correlate or decorrelate their inputs, depending on presented stimuli, while (at low-noise) efficient coding always predicts decorrelation. We compare predictive versus sparse coding of natural movies, showing that the two theories predict qualitatively different neural responses to visual motion. Our approach promises a way to explain the observed diversity of sensory neural responses, as due to a multiplicity of functional goals performed by different cell types and/or circuits.

Ulisse Ferrari*, Christophe Gardella, Olivier Marre and Thierry Mora (*Institut de la Vision, UPMC and Inserm Paris, France)
Closed-loop estimation of retinal network sensitivity reveals signature of efficient coding

Manuel Schottdorf*, Julian Vogel, Hecke Schrobsdorff, Walter Stühmer and Fred Wolf (*Max Planck Institute for Dynamics and Self-Organization, Göttingen, Germany)
A synthetic neurobiology approach to visual cortical feature selectivity

To Top

Emerging Applications of Sequencing

Luca Alessandrì, Marco Beccuti, Raffaele Calogero* and Gennaro Delibero (*Dept of Molecular Biotechnology and Health Sciences, University of Torino, Italy)
CASC: Classification Analysis of Single Cell Sequencing Data

Single-cell sequencing is a powerful technology to study cell heterogeneity and represents a new frontier for the bioinformatics community. Cell heterogeneity analysis requires the use of clustering methods, e.g. PCA, t-SNE, Laplacian eigenmaps, zero-inflated factor analysis, kernel-based similarity learning. However, since cell clusters represent the subpopulations describing cell heterogeneity, it is mandatory to demonstrate that clustered subpopulations area biologically meaningful and they are not simply technical artifacts. Thus, cluster validation methods are needed and, at the present time, these validation procedures are not implemented in clustering tool used in single cell data analysis. Another important topic is the availability of cluster-specific gene-signature detection methods. Cluster-specific gene-signatures are very important because they allow translating in different experimental setting, e.g. multicolor FACS analysis, the results of single-cell sequencing. To address the above issues, we have developed CASC, a tool implemented in a docker container, that uses as core application to detect cell clusters the “kernel based similarity learning” (Wang et al. Nature Method 2017). CASC allows: (i) identification of the optimal number of clusters for cell partitioning using “silhouette method”. (ii) Evaluation of clusters stability. Cluster stability is measured defining the permanence of a cell in a cluster upon random removal of a subset of cells. Each cell is represented on the basis of the percentage of permutation events associating it to a specific cluster. (iii) Feature selection. The feature selection is based on the “nearest shrunken centroid method” described by Tibshirani (PNAS 2002) for the classification of microarray data. Since the statistics applied to microarrays does not fit the RNAseq data structure, we adapted the Tibshirani method to the gene Index Of Dispersion (Diaz et al. Bioinformatics 2016). CASC is part of Reproducible Bioinformatics project.

Jonathan D. Magasin, Brent Nowinski, Mary Ann Moran and Dietlind L. Gerloff* (*Foundation for Applied Molecular Evolution, FL, USA & University of Luxembourg, Luxembourg)
A new look at old metagenome data highlights geographically wide-spread marine bacteria

Jonas Ibn-Salem* and Miguel Andrade-Navarro (*Johannes Gutenberg University Mainz / IMB Mainz, Germany)
Prediction of chromatin looping interactions from ChIP-seq profiles at CTCF motifs

Enkelejda Miho*, Victor Greiff, Rok Roskar and Sai Reddy (*ETH Zurich, Switzerland)
Large-scale network analysis reveals that antibody repertoires are reproducible, redundant and robust

Julien Racle*, Kaat De Jonge, Petra Baumgaertner, Daniel E. Speiser and David Gfeller (*Ludwig Centre for Cancer Research & Swiss Institute of Bioinformatics, Lausanne, Switzerland) [ Preprint ]
Simultaneous enumeration of cancer and immune cell types from bulk tumor gene expression data

Immune cells infiltrating tumors can have important impact on tumor progression and response to therapy. We present an efficient algorithm to simultaneously estimate the fraction of cancer and immune cell types from bulk tumor gene expression data. Our method integrates novel gene expression profiles from each major non-malignant cell type found in tumors, renormalization based on cell-type specific mRNA content, and the ability to consider uncharacterized and possibly highly variable cell types. Feasibility is demonstrated by validation with flow cytometry, immunohistochemistry and single-cell RNA-Seq analyses of human melanoma and colorectal tumor specimens. Altogether, our work not only improves accuracy but also broadens the scope of absolute cell fraction predictions from tumor gene expression data, and provides a unique novel experimental benchmark for immunogenomics analyses in cancer research.

Alban Ramette*, Maria-Teresa Barbani, Miguel Terrazos Miani, Jacqueline Steinlin-Schopfer, Pascal Bittel, Franziska Suter and Stephen Leib (*Institute for Infectious Diseases, University of Bern, Switzerland)
Technology development of real-time sequencing in infectious diseases: application to clinically relevant RNA viruses

To Top

Evolution and Phylogeny

Guy Baele*, Philippe Lemey, Andrew Rambaut and Marc Suchard (*KU Leuven / Rega Institute, Leuven, Belgium)
Adaptive MCMC in Bayesian Phylogenetics

Iakov Davydov*, Nicolas Salamin and Marc Robinson-Rechavi (*University of Lausanne & Swiss Institute of Bioinformatics, Lausanne, Switzerland) [ Preprint ]
Alpha and Omega of Darwinian selection: disentangling the two in codon models

There are numerous sources of variation in the rate of synonymous substitutions inside genes, such as direct selection on the nucleotide sequence, or mutation rate variation. However the majority of the codon models which are developed and widely used today still incorporate an assumption of effectively neutral synonymous substitution rate, constant between sites of each gene. We propose a simple yet effective extension to codon models, which incorporates codon substitution rate variation along the gene sequence. We assess the performance of our approach in simulations and on real data. We find strong effects of nucleotide rate variation on positive selection inference. The computational load of our approach remains tractable, and therefore we are able to apply it to genome scale positive selection scans. We apply our new method to two datasets: 767 vertebrate orthologs and 8,606 orthologs from twelve Drosophila species. We demonstrate that our new model is strongly favored by the data, and the support of the model increases with the amount of information. Moreover, it is able to capture signatures of nucleotide level selection acting on translation initiation and on splicing sites within the coding region. Finally, we show that rate variation is highest in the highly recombining regions, and we hypothesize that recombination and mutation rate variation, such as high CpG mutation rate, are the two main sources of nucleotide rate variation. Overall, nucleotide rate variation in substitutions is an important feature to capture, both to detect positive selection and to understand gene evolution, and the approach that we propose allows to do this in genome-wide scans.

Daniele Ramazzotti, Alex Graudenzi*, Luca De Sano, Marco Antoniotti and Giulio Caravagna (*Dept. of Informatics, Systems and Communication, University of Milan-Bicocca, Italy) [ Preprint ]
A computational framework to infer the order of accumulating mutations in individual tumors

Many statistical techniques quantify intra-tumor heterogeneity by reconstructing either clonal or mutational trees from multi-sample sequencing data of individual tumors. Most of these methods rely on the well-known Infinite Sites Assumption, and are limited to process either multi-region or single-cell sequencing data. Here, we improve over those methods with TRaIT (Temporal oRder of Individual Tumors), a unified statistical framework for the inference of the accumulation order of multiple types of genomic alterations driving tumor development. TRaIT supports both multi-region and single-cell sequencing data, and output mutational graphs accounting for violations of the Infinite Sites Assumption due to convergent evolution, and other complex phenomena that cannot be detected with standard tools. Our method displays better accuracy, performance and robustness to noise and small sample size than state-of-the-art methods. We show with single-cell data from breast cancer and multi-region data from colorectal cancer that TRaIT can quantify the extent of intra-tumor heterogeneity and generate new testable experimental hypotheses.

Alexis Loetscher*, Christian Hammer, Jacques Fellay and Evgeny M. Zdobnov (*University of Geneva & Swiss Institute for Bioinformatics, Geneva, Switzerland)
Association between genotypes of EBV and HIV-infected patients – A diversity study

Tamar Friedlander, Roshan Prizak*, Nicholas Barton and Gašper Tkačik (*Institute of Science and Technology Klosterneuburg, Austria)
Evolution of new regulatory functions on biophysically realistic fitness landscapes

Thomas Sakoparnig*, Chris Field and Erik van Nimwegen (*Biozentrum University of Basel & Swiss Institute of Bioinformatics, Basel Switzerland)
The dominance of recombination in E. coli genome evolution: why clonal ancestry cannot be recovered from genomic data

To Top

From Genotype to Phenotype and back, in Health and Disease

Kaur Alasoo*, Julia Rodrigues, Subhankar Mukhopadhyay, Andrew Knights, Alice Mann, Kousik Kundu, Christine Hale, Gordon Dougan and Daniel Gaffney (*University of Tartu, Estonia) [ Preprint ]
Genetic effects on chromatin accessibility foreshadow gene expression changes in macrophage immune response

Noncoding regulatory variants are often highly context-specific, modulating gene expression in a small subset of possible cellular states. Although these genetic effects are likely to play important roles in disease, the molecular mechanisms underlying context-specificity are not well understood. Here, we identify shared quantitative trait loci (QTLs) for chromatin accessibility and gene expression (eQTLs) and show that a large fraction (~60%) of eQTLs that appear following macrophage immune stimulation alter chromatin accessibility in unstimulated cells, suggesting they perturb enhancer priming. We show that such variants are likely to influence the binding of cell type specific transcription factors (TFs), such as PU.1, which then indirectly alter the binding of stimulus-specific TFs, such as NF-κB or STAT2. Our results imply that, although chromatin accessibility assays are powerful for fine mapping causal noncoding variants, detecting their downstream impact on gene expression will be challenging, requiring profiling of large numbers of stimulated cellular states and timepoints.

Abdullah Kahraman* and Christian von Mering (*University of Zurich & Swiss Institute of Bioinformatics, Zürich, Switzerland)
CanIsoNet: Predicting the pathological impact of alternatively spliced isoforms in cancer with an isoform-specific interaction network

Andrea Komljenovic* and Marc Robinson-Rechavi (*University of Lausanne & Swiss Institute of Bioinformatics, Lausanne, Switzerland)
Cross-species functional modules identify conserved splicing and immune biomarkers during aging

Carsten Magnus*, Lucia Reh, Penny Moore, Therese Uhr, Jacqueline Weber, Lynn Morris, Tanja Stadler and Alexandra Tkola (*D-BSSE, ETH Zurich, Basel, Switzerland)
Observing evolution in HIV-1 infection: determining antibody concentrations that drive viral escape

Daniel Marbach*, Sarvenaz Choobdar, Eren Ahsen, Jake Crawford, Lenore Cowen and Sven Bergmann (University of Lausanne & Swiss Institute of Bioinformatics, Lausanne, Switzerland)
Disease Module Identification as a DREAM Community Challenge

Identification of modules in molecular networks is at the core of many current analysis methods in biomedical research. However, how well different approaches perform to identify disease-relevant modules remains poorly understood. We launched an open competition to comprehensively assess module identification methods across diverse gene, protein and signaling networks (the Disease Module Identification DREAM Challenge). Predicted network modules were tested for association to complex traits and diseases using a unique collection of 184 genome-wide association studies (GWAS). While a number of approaches were effective and discovered complementary trait-associated modules, consensus modules derived from multiple methods performed best. We find that most of these modules correspond to core disease-relevant pathways, which often comprise therapeutic targets and correctly prioritize candidate disease genes. This community challenge establishes benchmarks, tools and guidelines for genomic network analysis to study human disease biology (https://synapse.org/modulechallenge).

Eleonora Porcu*, Alexandre Reymond and Zoltan Kutalik (*Center for Integrative Genomics, University of Lausanne & Swiss Institute of Bioinformatics, Lausanne, Switzerland)
Identifying novel genes whose tissue-specific expression level causally influence complex human traits

Matthew Robinson (Department of Computational Biology, University of Lausanne, Switzerland)
Improving genetic risk prediction for common complex diseases by leveraging all the available information

Rik Lindeboom, Fran Supek* and Ben Lehner (*Institute for Research in Biomedicine (IRB Barcelona), Barcelona, Spain)
The rules and impact of nonsense-mediated mRNA decay in human cancers

To Top

Macromolecular Structure, Dynamics and Function

Lukas Bartonek* and Bojan Zagrovic (*Max F. Perutz Laboratories & University of Vienna, Austria) [ OA paper ]
Linking RNA/protein interactions and the universal genetic code

It has recently been demonstrated that the nucleobase-density profiles of mRNA coding sequences are related in a complementary manner to the nucleobase-affinity profiles of their cognate protein sequences. Based on this, it has been proposed that cognate mRNA/protein pairs may bind in a co-aligned manner, especially if unstructured. Here, we study the dependence of mRNA/protein sequence complementarity on the properties of the nucleobase/amino-acid affinity scales used. Specifically, we sample the space of randomly generated scales by employing a Monte Carlo strategy with a fitness function that depends directly on the level of complementarity. For model organisms representing all three domains of life, we show that even short searches reproducibly converge upon highly optimized scales, implying that the topology of the underlying fitness landscape is decidedly funnel-like. Furthermore, the optimized scales, generated without any consideration of the physicochemical attributes of nucleobases or amino acids, resemble closely the nucleobase/amino-acid binding affinity scales obtained from experimental structures of RNA-protein complexes. This provides support for the claim that mRNA/protein sequence complementarity may indeed be related to binding between the two. Finally, we characterize suboptimal scales and show that intermediate-to-high complementarity can be reached by substantially diverse scales, but with select amino acids contributing disproportionally. Our results expose the dependence of cognate mRNA/protein sequence complementarity on the properties of the underlying nucleobase/amino-acid affinity scales and provide quantitative constraints that any physical scales need to satisfy for the complementarity to hold.

Bruno Correia (EPFL, Lausanne, Switzerland)
Computational Design of Functional Proteins for Biomedicine

Lorenzo Di Rienzo*, Edoardo Milanetti, Rosalba Lepore and Pier Paolo Olimpieri (*Department of Physics, Sapienza University, Rome, Italy) [ OA paper ]
Comparing Antibody Binding Sites by using shape descriptors: implications for predicting the nature of the bound antigen.

The comparison of protein structures has had a major impact in our understanding of protein structure and function and has opened the road to the development of a number of methods for structure classification and prediction, which are often useful for the inference of protein function. Traditional comparison methods rely on global or local superposition of the protein or region of interest, however their limitations are well known. I focused on a specific, and relevant, class of proteins, i.e. immunoglobulins. In this context, superposition-free measures, i.e. based on rotation-invariant properties, represent an attractive alternative. In the case at hand, the challenge is given by the fact that the loops forming the binding site can have different length and structure yet resulting in a similar binding surface. Therefore, comparing the actual surface rather than the atomic coordinates of the binding site may be preferable. I present a novel superposition-free method, rotation and translation invariant, able to effectively compare antibodies according to the shape of their binding site. The specific goal was to try and deduce information about the bound antigen, an aspect that is of relevance for several applications and, at the same time, a very complex and so far unsolved problem. The results showed that we were able, given the structure of an antibody, to correctly assign the antigen type (protein or non-protein) to 249 out of 329 antibody structures.

Andras Fiser (Albert Einstein College of Medicine, NY, USA)
ProtLID, a residue-based pharmacophore approach to identify cognate protein ligands in the Immunoglobulin Superfamily

Michal Bassani-Sternberg, Chloe Chong, Philippe Guillaume, Marthe Solleder, Huisong Pak, Philippe O. Gannon, Lana Kandalaft, George Coukos and David Gfeller* (*University of Lausanne, Switzerland)
Deciphering HLA-I motifs across HLA peptidomes improves neo-antigen predictions and identifies allostery regulating HLA specificity

Mirko Ledda* and Sharon Aviran (*University of California at Davis, CA, USA)
Transcriptome-wide search for functional RNA elements via structural data signatures

Duccio Malinverni*, Alfredo Jost-Lopez, Gerhard Hummer, Paolo De Los Rios and Alessandro Barducci (*Laboratoire de Biophysique Statistique, EPFL, Lausanne, Switzerland) [ OA paper ]
Coevolutionary Analysis of the Hsp70 chaperone machinery

Ruben Sanchez-Garcia*, Joan Segura, Carlos O. S. Sorzano and Jose M. Carazo (*Biocomputing Unit, Spanish National Center for Biotechnology (CSIC) & GN7 of the Spanish National Institute for Bioinformatics (INB), Madrid, Spain)
Residue-residue contact prediction based on a random forest multi-step approach

Protein-Protein Interactions (PPI) are essential for most cellular processes. Different experimental techniques are used to study PPIs; however, most of these methods are expensive and time consuming and thus, not suitable to analyze interactions at organism scale. On the other hand, computational approaches to study PPI are faster, less expensive and can be applied when no experimental data is available. Apart from protein docking approaches, there exists just a few methods designed to predict residue contact information, including correlated mutations-based methods and machine learning-based approaches. In this work, we present a new machine learning method developed to obtain residue-residue contact predictions from structural and sequential features. The algorithm codifies amino acids using structural attributes such as accessible surface area, protrusion index and atomic depth, and sequential properties such as Position Specific Scoring Matrices computed with PSI-Blast. The core of the method is comprised of three sequential random forest classifiers. The first random forest is fed with a feature vector describing a residue pair and their sequential neighbors. In the second step, the first step scores are combined with the structural and sequence features of the residue pair and their structural neighborhood. Finally, the last classifier is fed with the first and second step predictions. The quality of our predictions has been assessed by leave-one-complex-out cross-validation over the protein complexes compiled in the Protein-Protein Docking Benchmark 5.0 dataset. For each of the complexes, ROC AUC was computed, achieving values, on average, slightly above of 90% for pair prediction and above 80% for binding site prediction. Comparison with previously described methods under the same conditions shows that our approach outperforms the reported results.

Stefan Schuster*, Maximilian Fichtner and Severin Sasso (*Dept of Bioinformatics, University of Jena, Germany) [ OA paper ]
Use of Fibonacci numbers in lipidomics – Enumerating various classes of fatty acids

In lipid biochemistry, a fundamental question is how the potential number of fatty acids increases with their chain length. Here, we show that it grows according to the famous Fibonacci numbers when cis/trans isomerism is neglected. Since the ratio of two consecutive Fibonacci numbers tends to the Golden section, 1.618, organisms can increase fatty acid variability approximately by that factor per carbon atom invested. Moreover, we show that, under consideration of cis/trans isomerism, modification by hydroxy and/or oxo groups, triple bonds or adjacent double bonds, diversity can be described by generalized Fibonacci numbers (e.g. Pell numbers or hitherto scarcely studied number series). For the sake of easy comprehension, we deliberately build the proof on the recursive definitions of these number series. Similar calculations can be applied to aliphatic amino acids. For example, some amino acids harbouring triple bonds occur in toxic mushrooms. Our results should be of interest for mass spectrometry, combinatorial chemistry, synthetic biology, patent applications, use of fatty acids as biomarkers and the theory of evolution.

To Top

Reproducibility and Robustness of Large Scale Biological Analyses

Kjong-Van Lehmann*, Andre Kahles, Nikolaus Schultz, Chris Sander and Gunnar Rätsch (*Department of Computer Science, ETH Zurich, Switzerland)
Cleaner Expression Signatures by Alignment-Free Degradation Assessment of RNA-seq Data

Serghei Mangul (University of California Los Angeles (UCLA), CA, USA) [ Preprint ]
Profiling immunoglobulin repertoires across multiple human tissues by RNA Sequencing

Assay-based approaches provide a detailed view of the adaptive immune system by profiling immunoglobulin (Ig) receptor repertoires. However, these methods carry a high cost and lack the scale of standard RNA sequencing (RNA-Seq). Here we report the development of ImReP, a novel computational method for rapid and accurate profiling of the immunoglobulin repertoire from regular RNA-Seq data. ImReP can also accurately assemble the complementary determining regions 3 (CDR3s), the most variable regions of Ig receptors. We applied our novel method to 8,555 samples across 53 tissues from 544 individuals in the Genotype-Tissue Expression (GTEx v6) project. ImReP is able to efficiently extract Ig-derived reads from RNA-Seq data. Using ImReP, we have created a systematic atlas of 3.6 million Ig sequences across a broad range of tissue types, most of which have not been studied for Ig receptor repertoires. We also compared the GTEx tissues to track the flow of Ig clonotypes across immune-related tissues, including secondary lymphoid organs and organs encompassing mucosal, exocrine, and endocrine sites, and we examined the compositional similarities of clonal populations between these tissues. The Atlas of Immune Immunoglobulin repertoires (The AIR), is freely available at https://smangul1.github.io/TheAIR/ , is one of the largest collection of CDR3 sequences and tissue types. We anticipate this recourse will enhance future immunology studies and advance development of therapies for human diseases. ImReP is freely available at https://mandricigor.github.io/imrep/

Morten Rye*, Helena Bertilsson, Maria Andersen, Kjersti Rise, Tone Bathen, Finn Drabløs and May-Britt Tessem (*Norwegian University of Science and Technology, Department of Cancer Research and Molecular Medicine, Trondheim, Norway)
Computational assessment of tissue hetereogeneity in patient tissue samples reveals that cholesterol synthesis pathway genes in prostate cancer are consistently downregulated

Charlotte Soneson* and Mark D Robinson (*Institute of Molecular Life Sciences, University of Zurich & SIB Swiss Institute of Bioinformatics, Zürich, Switzerland) [ Preprint ]
Bias, consistency and scalability in differential expression analysis of single-cell RNA-seq data

As single-cell RNA-seq (scRNA-seq) is becoming increasingly widely used, the amount of publicly available data also grows rapidly, generating a useful resource for computational method development as well as for reanalysis and extension of published results. In public data repositories often both the raw data files and a processed dataset are made available. The procedure to obtain the processed data set can be widely different between data sets, which may complicate reuse and cross-dataset comparisons. To simplify this aspect, we present conquer, an open collection of consistently processed, analysis-ready public single-cell RNA-seq data sets. We provide count and transcripts per million (TPM) estimates for both genes and transcripts, as well as quality control and exploratory analysis reports to assist users in determining whether a particular data set is suitable for their purposes. We then use a subset of the data sets available via conquer to perform an extensive evaluation of the performance and characteristics of a variety of statistical methods for differential gene expression analysis of pre-defined cell populations. We evaluate more than 30 statistical approaches to differential expression, using both experimental and simulated scRNA-seq data. Considerable differences are found between the methods in terms of the number, but also the characteristics of the genes that are called differentially expressed. We further show that pre-filtering of genes can have important effects on the results, particularly for some of the methods originally developed for analysis of bulk RNA-seq data. In addition to differences between methods, we also note substantial inconsistencies between results for the same method applied to different subsets of a given data set. The inconsistencies are less pronounced for some non-parametric or transformation-based methods, suggesting that current count-based approaches may be more sensitive to small perturbations in single-cell RNA-seq data.

Simone Tiberi* and Mark Robinson (*Institute of Molecular Life Sciences, University of Zurich & Swiss institute of Bioinformatics, Zürich, Switzerland)
Bayesian DTU analysis accounting for subject-to-subject variability and for mapping uncertainty

Pratyaksha Wirapati* and Mauro Delorenzi (*Bioinformatics Core Facility, Swiss Institute of Bioinformatics, Lausanne, Switzerland)
Self-Normalizing Predictors for Robust Clinical Applications of Omics Signatures

To Top

Stochasticity, Heterogeneity, and Single Cells

Sybille Dühring*, Jan Ewald, Sebastian Germerodt, Christoph Kaleta, Thomas Dandekar and Stefan Schuster (*Department of Bioinformatics, Friedrich-Schiller-Universität Jena, Germany) [ OA paper ]
Modelling the host-pathogen interactions of macrophages and Candida albicans

Thomas Julou*, Athos Fiori, Erik van Nimwegen (*Biozentrum University of Basel & Swiss Institute of Bioinformatics, Basel, Switzerland)
Quantitatively tracking gene regulation in single cells

In spite of intense study, much is still not understood about how gene regulatory interactions control cell fate decisions in single cells, in part due to the difficulty of directly observing and measuring gene regulatory processes in vivo. We present a integrated experimental and computational setup consisting of a dual-input microfluidic device and accompanying image analysis software that allows long-term and highly accurate tracking of growth and gene expression in lineages of single cells. In particular, the dual-input design of the device allows growth conditions to be dynamically varied in a controlled manner, and allows accurate quantification of the regulatory responses of single cells due to environmental changes.

As an example application we observe the response of E. coli cells to a sudden switch in carbon source from glucose to lactose. We find that the loss of glucose leads to an immediate growth arrest in all cells and that the distribution of lag times to exit this growth arrest is multi-modal. While a minority of cells responds by inducing their lac operon and re-commencing growth within 20-40 minutes, the majority take 3 to 6 times as long to respond, and some cells are unable to exit growth arrest altogether.

In addition we present a general theoretical analysis that demonstrates that these observations are not surprising from an evolutionary perspective: for isogenic populations with heterogeneous lag times, the population fitness is mainly determined by the response times of the fastest cells in the population, and insensitive to long tails of slow or non-responding cells.

Nacho Molina (IGBMC - CNRS - University of Strasbourg, France) [ Preprint ]
Chromatin structure shapes the diffusion dynamics of transcription factors

The diffusion of regulatory proteins within the nucleus plays a crucial role in the dynamics of transcriptional regulation. The standard model assumes a 3D plus 1D diffusion process: regulatory proteins either move freely in solution or slide on DNA. This model however does not considered the 3D structure of chromatin. Here we proposed a multi-scale stochastic model that integrates, for the first time, high-resolution information on chromatin structure as well as DNA-protein interactions. The dynamics of transcription factors was modeled as a slide plus jump diffusion process on a chromatin network based on pair-wise contact maps obtained from high-resolution Hi-C experiments. Our model allowed us to uncover the effects of chromatin structure on transcription factor occupancy profiles and target search times. Finally, we showed that binding sites clustered on few topological associated domains leading to a higher local concentration of transcription factors which could reflect an optimal strategy to efficiently use limited transcriptional resources.

Robert Noble*, John Burley and Michael Hochberg (*ETH Zurich, Switzerland)
Impact of tissue architecture on the nature and predictability of tumour evolution

Thomas R. Sokolowski* and Gašper Tkačik (*Institute of Science and Technology Austria, Klosterneuburg, Austria)
Deriving the Drosophila gap gene system ab initio by optimizing information flow

Marie Ming Aynaud, Olivier Mirabeau, Gruel Nadege, Sandrine Grossetête-Lalami, Svetlana Gribkova, Valentina Boeva, Didier Surdez, Olivier Saulnier, Sylvain Durand, Ulykbek Kairov, Virginie Raynal, Franck Tirode, Thomas Gruenewald, Jean-Philippe Vert, Emmanuel Barillot, Olivier Delattre and Andrei Zinovyev* (*Institut Curie, Paris, France)
Time-resolved single-cell transcriptome deconvolution unravels specific and cell cycle independent dynamics of EWS-FLI1-mediated transcriptional regulation

To Top

Technology track: Software and Technology, Demos and Tutorials

Ben Vandervalk, Shaun Jackman, Justin Chu, Hamid Mohamadi, Sarah Yeo, S Austin Hammond, Lauren Coombe, Cath Ennis, Rene Warren and Inanc Birol* (*British Columbia Cancer Agency, Genome Sciences Centre, Vancouver, Canada)
ABySS 2.0: A resource-efficient de novo sequence assembly algorithm

Raul Catena*, Bernd Bodenmiller and Denis Schapiro (*Institute of Molecular Life Sciences, University of Zurich, Switzerland)
Machine Learning-powered analysis of multi-parametric imaging in 2D and 3D using miCAT platform modules.

Antoine Daina* and Vincent Zoete (*Molecular Modeling Group, Swiss Institute of Bioinformatics, Lausanne, Switzerland) [ OA paper ]
A BOILED-Egg to predict gastrointestinal absorption and brain penetration of small molecules

Leonardo de Oliveira Martins* and Christophe Dessimoz (*Computational Evolutionary Biology and Genomics Group, University of Lausanne & Swiss Institute of Bioinformatics, Lausanne, Switzerland)
Spectral signature of gene family trees

Geoffrey Fucile*, Pablo Escobar López and Konrad Jaggi** (*sciCORE University of Basel & Swiss Institute of Bioinformatics, Basel, Switzerland; **SWITCH, Zürich, Switzerland)
SWITCHengines for compute-intensive biological research and training using next-generation sequencing workflows

Steve Gardner (RowAnalytics, Oxford, UK)
Delivering Precision Medicine, One Patient at a Time

Precision medicine promises to deliver the right treatments to the right patient at the right time, every time. This has been shown to have major impact on improved diagnosis and selection of treatment options, leading to better outcomes for patients whilst significantly reducing the cost of their healthcare. At the same time, by informing patients about their personal disease risks, likely therapy responses & side effects, and the impact of changes they can make to their diet and lifestyle factors, we can engage patients to become informed and active partners in the management of their own health. This talk will describe new platforms that enables very rapid discovery and validation of multi-omics biomarker networks, and then their use in fully personalized clinical decision support and patient-focused digital health tools. An example association analysis of a population of 14,777 people, all of whom had BRCA1 and/or BRCA2 mutations will be described. The most complex biomarker networks identified, present in up to 103 affected people and 0 controls, contain 17 SNPs acting in combination. These results were found and validated in 6 days on a single 4 GPU POWER8 server.

Maria Katsantoni*, Alexander Kanitz, Anastasiya Börsch, Andrea Riba and Mihaela Zavolan (*Biozentrum University of Basel & Swiss Institute of Bioinformatics, Basel, Switzerland)
Identification of RNA-binding protein binding sites from eCLIP data

Suzanna E. Lewis*, Christopher Mungall, Jim Balhoff, Eric Douglass, Kent Shefchek, Seth Carbon, Dan Keith and Melissa Haendel (*University of Southern California, Davis, CA, USA)
BioLink: An API for linked biological knowledge

Riccardo Murri and Sergio Maffioletti* (*S3IT, University of Zurich, Switzerland)
ElastiCluster: Automated provisioning of computational clusters in the cloud

João Matias Rodrigues*, Sebastian Schmidt, Janko Tackmann and Christian Von Mering (*University of Zurich, Switzerland)
MAPseq: Improved Speed, Accuracy And Consistency In Ribosomal RNA Sequence Analysis

Maido Remm*, Fanny-Dhelia Pajuste, Lauris Kaplinski, Märt Möls, Tarmo Puurand and Maarja Lepamets (*University of Tartu, Estonia) [ OA paper ]
FastGT: from raw sequence reads to 30 million genotypes in less than an hour

We have developed a computational method that counts the frequencies of unique k-mers in FASTQ-formatted genome data and uses this information to infer the genotypes of known variants. FastGT can detect the variants in a 30x genome in less than 1 hour using ordinary low-cost server hardware. The overall concordance with the genotypes of two Illumina “Platinum” genomes is 99.96%, and the concordance with the genotypes of the Illumina HumanOmniExpress is 99.82%. Our method provides k-mer database that can be used for the simultaneous genotyping of approximately 30 million single nucleotide variants (SNVs), including >23,000 SNVs from Y chromosome. The source code of FastGT software is available at GitHub.

Thibault Robin*, Markus Mueller, Frédérique Lisacek, Amos Bairoch and Lydie Lane (*University of Geneva & Swiss Institute of Bioinformatics, Geneva, Switzerland)
MzVar: a Java tool to compile customized variant protein and peptide databases

With the ever-increasing mass accuracy of spectrometers, database search maintains its position as the most efficient processing approach to identify proteins from biological samples. In this method, experimental mass spectra are searched against theoretical mass spectra inferred from peptides in a reference sequence database. Database compilation is a key step in the search workflow, since identification is limited to the peptides contained in the database. Consequently, in order to identify variant peptides, known mutations have to be inserted in their corresponding protein sequences and included in the database. Here, we present MzVar that compiles customized variant protein and peptide databases from gene transcript sequences and gene sequence variants.

https://bitbucket.org/sib-pig/mzvar-public

Felipe Simao Neto*, Mathieu Seppey, Robert Waterhouse, Mosè Manni, Panagiotis Ioannidis, Evgenia V. Kriventseva and Evgeny Zdobnov (*University of Geneva & Swiss Institute of Bioinformatics, Geneva, Switzerland)
Genomic utilities of BUSCO v3

Neha Daga, Milan Simonovic* and Christian von Mering (*University of Zurich & Swiss Institute of Bioinformatics, Zürich, Switzerland)
GeneAssassin – User-friendly design of CRISPR/Cas guides with application-specific sub-scoring

Stefan Bienert, Gabriel Studer*, Andrew Waterhouse, Gerardo Tauriello, Martino Bertoni, Rafal Gumienny, Christine Rempfer, Florian Heer, Tjaart de Beer, Lorenza Bordoli and Torsten Schwede (*Biozentrum University of Basel & Swiss Institute of Bioinformatics, Basel, Switzerland)
SWISS-MODEL – a web-based expert system for modelling protein tertiary and quaternary structures using evolutionary information

Homology modelling is currently the most accurate technique for generating 3-dimensional models for proteins starting from their sequence. SWISS-MODEL pioneered the idea of fully automating the process in a robust and reliable internet-based service, i.e. making structure and model information easily accessible to nonexpert scientists from a web browser, without the need for locally managing large databases and complex software packages. Over the last decades, the combination of systematic experimental structural biology efforts and comparative protein structure modelling has led to paradigm change in molecular biology research, where today some form of structure information being available for the majority of proteins studied in life sciences. SWISS-MODEL is today one of the most widely used modelling services worldwide and one of the core bioinformatics resources of the SIB. It is actively developed to continuously improve the quality of the models, as well as to introduce new features of functionality required to support the research of the user community. In this tech track presentation, we will present the latest development of the system and highlight functionalities which are specifically relevant from a user perspective for making efficient use of structure modelling in their own research.
https://swissmodel.expasy.org/

Philip Zimmermann* and Stefan Bleuler (*Nebion AG, Zürich, Switzerland)
"Disease-related" versus "disease-specific": how GENEVESTIGATOR addresses target and biomarker discovery

Vincent Zoete*, Antoine Daina, Dennis Haake, Christophe Bovigny and Olivier Michielin (*Molecular Modeling Group, Swiss Institute of Bioinformatics, Lausanne, Switzerland)
Computer-aided Drug Design at SIB - The SwissDrugDesign project

To Top