WS3: Scientific Software Engineering

Organizers:

Markus Müller (Swiss Institute of Bioinformatics, University of Geneva), Alexandre Masselot (Swiss Institute of Bioinformatics, University of Lausanne)

Workshop Summary:

Software solutions for life science applications come in many shades and colors. They include complex distributed software solutions, statistical software, databases, class libraries, commercial and open source as well as small scripts to convert and evaluate data in a lab. These solutions differ in their life cycle: some are intended to be used and maintained for a long time, whereas others were developed for a special purpose without the need for reuse. In this workshop we would like to discuss modern software engineering strategies and their usefulness for different types of life science software solutions. Software development in life sciences takes place in the particular situation, where most of the bioinformaticians or computational biologists do not come from a software engineering culture although most of their time is actually spent writing code. Moreover the research environment, being creative and original, creates the impression that software development methodology does not apply there. Prototypes often turn into “production” code, generating large long term cost and limit in stability, bug resolution and evolution. However, we believe that scientific software development could benefit from experiences in other fields, such as agile methodology, unit and integration testing, and code modularity. Also open source repositories and free and powerful versioning systems nowadays encourage sharing code and better management of software.

Target audience.

We would like to invite researchers from various fields (databases, distributed software, class libraries, lab automation, workflow management) to give their views and have an open discussion about the pros and cons of the different approaches to software development. This workshop will also give junior researchers an introduction to software engineering techniques and warn them of potential pitfalls in their career.

Workshop Agenda

14:00 to 14:05 : Alexandre Masselot & Markus Müller

Welcome

14:05 to 14:35 : Alexandre Masselot

Technical Debt in Scientific software

14:35 to 15:05 : Frédéric Schütz

The importance of reproducible research

15:05 to 15:35 : Jean-Luc Falcone

Scala: The most excellent language for bioinformatics

15:35 to 16:00 : Coffee break

16:00 to 16:30 : Jeremy Stucki

Building robust web applications/visualizations with React using the Flux
architecture

16.30 to 17:00 : Stefan Eilemann

Scientific Visualization Software Engineering

17:00 to 17:30 : Chandrasekhar Ramakrishnan

Best Practices in Programming: A minimalist guide

 

Technical Debt in Scientific software

Alexandre Masselot

Senior scientist at Vital-IT/SIB

alexandre.masselot@isb-sib.ch

In any domain, writing software balances between delivery speed and longer term quality. Aiming for the first increases a technical debt that one should consider to pay (or not) in the future. If scientific software creation shares a lot of similitude with other domains, it also raises new challenges regarding to such a debt management. We will also discuss how one can acknowledge the phenomenon and a couple of methodologies to cope with it.

 

The importance of reproducible research

Frédéric Schütz

Maître d'Enseignement et de Recherche, Center for Integrative Genomics

Senior Statistician, Bioinformatics Core Facility (Delorenzi group)

frederic.schutz@isb-sib.ch

The scientific community has recently started to acknowledge that the way researchers usually analyse their data introduce problems and errors in the work that are published. The goal of this workshop is to discuss these problems and, most importantly, the potential solutions that could be useful to improve the situation.

Scala: The Most Excellent Language for Bioinformatics

Jean-Luc Falcone

Scientific and Parallel Computing Group, CUI, University of Geneva
HPC Application Analyst, CADMOS

jean-luc.falcone@unige.ch

The Scala programming language has recently gained fame, by powering distributed analysis on massive data for some large companies like Twitter, Netflix or Spotify. In this talk, I will try (hard) to convince you that Scala is suitable for scientific programming. In particular, its concise expressiveness, high performances and concurrency approaches make Scala a good candidate for bioinformatics software.

 

Building robust web applications/visualizations with React using the Flux architecture

Jeremy Stucki

Visualization Lead - Interactive Things, Zürich

jeremy@interactivethings.com

Managing state in interactive web applications and data visualizations quickly becomes a complex issue once the app starts to grow. With various actors (users, incoming data) influencing state over time, how can we guarantee that what we’re displaying is correct and how can we build our app in an expressive way that is performant and scales well?

To answer this mystery, I will introduce React, an increasingly popular open-source library for building user interfaces, and Flux, an architecture to manage data flow in applications.

 

Scientific Visualization Software Engineering

Stefan Eilemann

Section Manager Visualization, Human Brain Project

stefan.eilemann@epfl.ch

We will present the role and mission of the visualization team in the Blue Brain Project and Human Brain Projects, which work closely together and aim to accelerate our understanding of the human brain through simulation-driven research. We will present what projects we are currently engaged in, how we approach our software development, how this enables us to satisfy user requirements and future strategic developments. We will give an overview which engineering best practices we embrace, what the impact on our development is, and how this improves the quality and quantity of our development.

 

Best Practices in Programming: A minimalist guide

Chandrasekhar Ramakrishnan

SSDM | Scientific IT Services

ETH Zürich Informatikdienste

chandrasekhar.ramakrishnan@id.ethz.ch

Every year, the SIB offers a two-day course with the title Best Practices in Programming. The content of this course is the result of several iterations of courses targeted to bioinformaticians and researchers in the life sciences, designed to improve their effectiveness and efficiency in the development of software. The current material is a condensed distillation of a small number of simple techniques that will help anyone developing software, in the life sciences or otherwise, be more effective. They are: use version control; write unit tests; refactor your code; follow some basic coding principles. I will present these techniques and describe how we teach them.