WS3: Scientific Software Engineering
Organizers:
Markus Müller (Swiss Institute of Bioinformatics, University of Geneva), Alexandre Masselot (Swiss Institute of Bioinformatics, University of Lausanne)
Workshop Summary:
Software solutions for life science applications come in many shades and colors. They include complex distributed software solutions, statistical software, databases, class libraries, commercial and open source as well as small scripts to convert and evaluate data in a lab. These solutions differ in their life cycle: some are intended to be used and maintained for a long time, whereas others were developed for a special purpose without the need for reuse. In this workshop we would like to discuss modern software engineering strategies and their usefulness for different types of life science software solutions. Software development in life sciences takes place in the particular situation, where most of the bioinformaticians or computational biologists do not come from a software engineering culture although most of their time is actually spent writing code. Moreover the research environment, being creative and original, creates the impression that software development methodology does not apply there. Prototypes often turn into “production” code, generating large long term cost and limit in stability, bug resolution and evolution. However, we believe that scientific software development could benefit from experiences in other fields, such as agile methodology, unit and integration testing, and code modularity. Also open source repositories and free and powerful versioning systems nowadays encourage sharing code and better management of software.
Target audience.
We would like to invite researchers from various fields (databases, distributed software, class libraries, lab automation, workflow management) to give their views and have an open discussion about the pros and cons of the different approaches to software development. This workshop will also give junior researchers an introduction to software engineering techniques and warn them of potential pitfalls in their career.
Workshop Agenda
14:00 to 14:05 : Alexandre Masselot & Markus Müller
Welcome
14:05 to 14:35 : Alexandre Masselot
Technical Debt in Scientific software
14:35 to 15:05 : Frédéric Schütz
The importance of reproducible research
15:05 to 15:35 : Jean-Luc Falcone
Scala: The most excellent language for bioinformatics
15:35 to 16:00 : Coffee break
16:00 to 16:30 : Jeremy Stucki
Building robust web applications/visualizations with React using the Flux
architecture
16.30 to 17:00 : Stefan Eilemann
Scientific Visualization Software Engineering
17:00 to 17:30 : Chandrasekhar Ramakrishnan
Best Practices in Programming: A minimalist guide
Technical Debt in Scientific software
Alexandre Masselot
Senior scientist at Vital-IT/SIB
In any domain, writing software balances between delivery speed and longer term quality. Aiming for the first increases a technical debt that one should consider to pay (or not) in the future. If scientific software creation shares a lot of similitude with other domains, it also raises new challenges regarding to such a debt management. We will also discuss how one can acknowledge the phenomenon and a couple of methodologies to cope with it.
The importance of reproducible research
Frédéric Schütz
Maître d'Enseignement et de Recherche, Center for Integrative Genomics
Senior Statistician, Bioinformatics Core Facility (Delorenzi group)
The scientific community has recently started to acknowledge that the way researchers usually analyse their data introduce problems and errors in the work that are published. The goal of this workshop is to discuss these problems and, most importantly, the potential solutions that could be useful to improve the situation.
Scala: The Most Excellent Language for Bioinformatics
Jean-Luc Falcone
Scientific and Parallel Computing Group, CUI, University of Geneva
HPC Application Analyst, CADMOS
The Scala programming language has recently gained fame, by powering distributed analysis on massive data for some large companies like Twitter, Netflix or Spotify. In this talk, I will try (hard) to convince you that Scala is suitable for scientific programming. In particular, its concise expressiveness, high performances and concurrency approaches make Scala a good candidate for bioinformatics software.
Building robust web applications/visualizations with React using the Flux architecture
Jeremy Stucki
Visualization Lead - Interactive Things, Zürich
Managing state in interactive web applications and data visualizations quickly becomes a complex issue once the app starts to grow. With various actors (users, incoming data) influencing state over time, how can we guarantee that what we’re displaying is correct and how can we build our app in an expressive way that is performant and scales well?
To answer this mystery, I will introduce React, an increasingly popular open-source library for building user interfaces, and Flux, an architecture to manage data flow in applications.
Scientific Visualization Software Engineering
Stefan Eilemann
Section Manager Visualization, Human Brain Project
We will present the role and mission of the visualization team in the Blue Brain Project and Human Brain Projects, which work closely together and aim to accelerate our understanding of the human brain through simulation-driven research. We will present what projects we are currently engaged in, how we approach our software development, how this enables us to satisfy user requirements and future strategic developments. We will give an overview which engineering best practices we embrace, what the impact on our development is, and how this improves the quality and quantity of our development.
Best Practices in Programming: A minimalist guide
Chandrasekhar Ramakrishnan
SSDM | Scientific IT Services
ETH Zürich Informatikdienste
chandrasekhar.ramakrishnan@id.ethz.ch
Every year, the SIB offers a two-day course with the title Best Practices in Programming. The content of this course is the result of several iterations of courses targeted to bioinformaticians and researchers in the life sciences, designed to improve their effectiveness and efficiency in the development of software. The current material is a condensed distillation of a small number of simple techniques that will help anyone developing software, in the life sciences or otherwise, be more effective. They are: use version control; write unit tests; refactor your code; follow some basic coding principles. I will present these techniques and describe how we teach them.