[Basel Computational Biology Conference
2005] |
|
Abstracts |
|
Keynote Lecture: Mass Spectrometry based Proteomics:
Computational Challenges and Partial Solutions
Ruedi Aebersold
|
Institute for Molecular Systems
Biology, ETH Zurich, Switzerland and
Institute for Systems Biology, Seattle, USA |
The objective
of proteomics is the systematic analysis of the proteins expressed
by a cell, tissue or organism. It is expected that such analyses
will define comprehensive molecular signatures of tissues, cells
and body fluids in health and disease. Such signatures are impacting
a wide range of biological and clinical research questions, such
as the systematic study of biological processes and the discovery
of molecular clinical markers for detection, diagnosis and assessment
of treatment outcome. The application of proteomics technology has
proven particularly beneficial in cases in which differences between
the proteomes (or fractions thereof) isolated from cells at different
states have been analyzed, i.e. in which the analyses have been
performed with accurate quantification.
Currently
most successful quantitative proteomic analyses are based on mass
spectrometry and tandem mass spectrometry. In the context of such
studies 10exp4 to 10exp5 tandem mass spectra are generated, each
one potentially representing a unique peptide sequence. The computational
assignment of these spectra to their corresponding peptide sequences,
the statistical validation of these assignments, the extraction
of reliable biological information from these datasets and the dissemination
of the data represent a series of significant computational challenges
that are at present only partially solved.
In this
presentation we will discuss current platforms for the mass spectrometric
collection of proteomic data, describe a suite of OS source tools
for their computational analysis and discuss remaining challenges
Since
most biological networks involve proteins, proteomics, the global
analysis of the protein complement of a cell or tissue is a central
element of systems biology. In this presentation we will discuss
the current status of quantitative proteomics technologies and some
of the resources that have emerged form the data they produce.
We will also show with selected examples how quantitative proteomics
can impact common types of experiments currently carried out in
many biological research projects and discuss the challenges that
remain to turn proteomics into a truly genomic science.
References:
- Aebersold R, Mann M, Mass spectrometry-based
proteomics, Nature: 2003: 422 (6928):198-207.
- Ranish JA, Yi EC, Leslie DM, Purvine SO, Goodlett
DR, Eng J, Aebersold R. The study of macromolecular complexes
by quantitative proteomics. Nat Genet. 2003 Mar;33(3):349-55.
- Ranish JA, Hahn S, Lu Y, Yi EC, Li XJ, Eng
J, Aebersold R. Identification of TFB5, a new component of general
transcription and DNA repair factor IIH. Nat Genet. 2004 Jul;36(7):707-13.
- Giglia-Mari G, Coin F, Ranish JA, Hoogstraten
D, Theil A, Wijgers N, Jaspers NG, Raams A, Argentini M, van der
Spek PJ, Botta E, Stefanini M, Egly JM, Aebersold R, Hoeijmakers
JH, Vermeulen W. A new, tenth subunit of TFIIH is responsible
for the DNA repair syndrome trichothiodystrophy group A. Nat Genet.
2004 Jul;36(7):714-9.
- Desiere F, Deutsch EW, Nesvizhskii AI, Mallick
P, King NL, Eng JK, Aderem A, Boyle R, Brunner E, Donohoe S, Fausto
N, Hafen E, Hood L, Katze MG, Kennedy KA, Kregenow F, Lee H, Lin
B, Martin D, Ranish JA, Rawlings DJ, Samelson LE, Shiio Y, Watts
JD, Wollscheid B, Wright ME, Yan W, Yang L, Yi EC, Zhang H, Aebersold
R. Integration with the human genome of peptide sequences obtained
by high-throughput mass spectrometry. Genome Biol. 2005;6(1):R9.
- Flory MR, Carson AR, Muller EG, Aebersold
R. An SMC-domain protein in fission yeast links telomeres to the
meiotic centrosome. Mol Cell. 2004 Nov 19;16(4):619-30.

|
Current and future
challenges in proteome informatics
Ron D. Appel (Swiss institute of Bioinformatics,
University of Geneva, and Geneva Bioinformatics (GeneBio) )
Proteomics
aims at deciphering the proteome, the complement of the genome,
with the goal of increasing the understanding of biological processes,
as well as improving and speeding up the development of drugs by
discovering disease biomarkers and drug targets. The major elements
of proteome analysis are powerful protein separation techniques
such as liquid chromatography (LC) and two-dimensional electrophoresis
(2-DE) associated to enzymatic processing and mass spectrometry
(MS). These techniques have requested considerable efforts in the
development of dedicated bioinformatics for over two decades, providing
researchers with comprehensive and state-of-the-art software tools
and databases for proteome analysis. The major areas of interest
encompass protein identification from MS data and the storage and
exchange of proteomics data. Current challenges include in particular
the in-depth characterization of proteins from MS data using new
and advanced algorithms, exploiting to its fullest the available
experimental data, in particular data redundancy to extract biological
knowledge, and the integration of information available across several
databases as a step towards integrated systems biology.
|
Computational
approaches in microbial strain engineering at DSM Nutritional Products
Sabine Arnold (DSM Nutritional Products,
Basel, Switzerland)
The
wealth of high-quality functional genomics data sets and the improved
accessibility of high-performance computing power have vastly propelled
the development of new computational methods that are capable of
integrating these large-scale heterogeneous data sets. Depending
on the application purpose, one may select from a variety of methods
with different analysis focus (e.g., clustering techniques, statistical
replicate analysis, correlation analysis, neural nets). Additionally,
genome-derived metabolic network models are now increasingly developed
in particular for microbial systems, mostly due to their reduced
biochemical complexity in comparison to higher-eukaryotic systems.
These usually stoichiometric models are applied for studying the
effects of genetic modification and change in environmental parameters,
and the impact these modifications cause on metabolic flux distribution
embedded into the cellular context. The ultimate vision of utilizing
such models in the biotech industry is to gain a systems-level understanding
of cellular physiology and thereby to assist in both rational strain
engineering and process development strategies.
|
|
Towards spatial and
temporal protein interaction networks
Peer Bork (EMBL, Heidelberg and MDC, Berlin)
As
cellular networks are getting more and more refined, it is becoming
feasible to move from 2D representations (nodes and edges) to 4D
i.e. explore temporal and spatial aspects of interaction networks.
I will introduce into recent work from our group to reveal temporal
changes (ranging from 90 minutes during the yeast cell cycle to
more than 2
billion years during species evolution) and will also touch upon
a few spatial aspects (protein complexes and cellular compartments).

|
|
Simulating
physiological states, regulatory networks and metabolic pathways
of bacteria for applications in antibiotic drug discovery
Christoph Freiberg (Bayer
HealthCare AG)
As current
antibiotics therapy becomes increasingly ineffectual, new technologies
are required to identify and develop novel classes of antibacterial
agents. Our comparative genome analyses enabled prediction of novel
cellular functions and complete pathways in bacteria, in order to
characterise novel targets suitable for antibacterial compound screening.
However, holistic strategies alternative to the focused target-based
approach become more and more important in antibiotic drug discovery.
Based on a compendium of genome-wide expression profiles reflecting
the physiological response of the model bacterium Bacillus subtilis
to hundred different antibiotic agents, we are able to simulate
regulatory networks and pathways and to predict their genetic control
elements. This way, we identified novel biomarkers for physiological
stress states, suitable for screening of compounds with specific
mechanisms of action. Moreover, our more elaborate expression profile
analysis based on classification algorithms as well as regulon and
pathway-specific data evaluation became a valuable tool to discover
the mechanism of action of novel antibiotic agents.
|
|
Reverse
engineering of metabolic pathways using sparse GGM
Wilhelm Gruissem
(Functional Genomics Center and ETH Zürich)
Wilhelm
Gruissem [1], Anja Wille, Philip Zimmermann, Eva Vranova, Andreas
Fürholz, Oliver Laule, Stefan Bleuler, Lars Hennig, Mattthias
Hirsch-Hoffmann, Amela Prelic, Lothar Thiele, Eckart Zitzler and
Peter Bühlmann, Reverse Engineering
Group [2] and Functional Genomics Center Zurich [3], Swiss Federal
Institute of Technology (ETH), Zurich.
The
analysis of genetic regulatory networks was greatly advanced by
the availability of large data sets from high-throughput technologies
such as DNA microarrays. The genome-wide, parallel monitoring
of gene activity will increase our understanding of the molecular
basis of pathway functions and their cellular network context. In
simple eukaryotes or prokaryotes, gene expression data has been
combined with two-hybrid data and phenotypic data to successfully
predict protein-protein interaction and transcriptional regulation
on a large scale. In higher organisms, however, little is
known about regulatory control mechanisms and pathway networks on
a larger scale. As a first step we have focused on isoprenoid
metabolism, which is universally conserved and essential for cell
survival. Arabidopsis has to independent pathways that function
in the cytoplasm and chloroplast [4].
We developed a novel graphical Gaussian modelling (GGM) approach
to elucidate the regulatory network of the two isoprenoid biosynthesis
pathways bases on large scale expression data [5]. When applying
this approach to infer a gene network, we detect modules of closely
connected genes and candidate genes for cross-talk between the isoprenoid
pathways. Genes of downstream pathways also fit well into
the network. We evaluated our approach in a simulation study
and using the yeast galactose utilization network. Connected genes
were independently validated using Genevestigator [6], a novel powerful
software suite for visualization of microarray and other data in
their biological context.
References:
- [1]
http://www.pb.ethz.ch
- [2]
http://www.rep.ethz.ch
- [3]
http://www.fgcz.ethz.ch
-
[4] Laule
O, Fürholz A, Chang HS, Zhu T, Wang X, Heifetz PB,
Gruissem W and Lange M. (2003) Crosstalk between cytosolic and
plastidial pathways of isoprenoid biosynthesis in Arabidopsis
thaliana, PNAS 100, 6866-6871.
- [5]
Wille A, Zimmermann P, Vranová E, Fürholz A, Laule
O, Bleuler S, Hennig L, Prelic A, Rohr P, Thiele L, Zitzler E,
Gruissem W, Bühlmann P (2004). Sparse graphical Gaussian
modeling of the isoprenoid gene network in Arabidopsis thaliana.
Genome Biology 5 : R92.
- [6]
http://www.genevestigator.ethz.ch

|
|
Predicting biomolecular systems and tracing their evolution
Martijn Huynen (Centre
for Molecular and Biomolecular Informatics, Radboud University Nijmegen)
The
accumulating wealth of genomes and other types of genomics data
gives us the opportunity both to predict the function of proteins
and their involvement in pathways as well as to trace the evolution
of such biomolecular systems. As genomics data are however inherently
noisy we need comparative analysis between multiple sets of data
to make reliable predictions. We have shown that, while the co-expression
of genes or the yeast-2-hybrid interaction of their proteins in
one species only provides a weak signal that their proteins functionally
interact, when that co-expression or interaction is measured in
multiple species, it does become a reliable signal (van Noort et
al, 2003; Huynen et al., 2004). One of the surprising observations
of such “horizontal comparative genomics” between species, is the
low level of conservation: less that 5% of genes that are co-expressed
in are also co-expressed in C.elegans , and less than 25%
of the yeast-2-hybrid interacting proteins from S.cerevisiae
have been observed to interact in D.melanogaster .
The question rises whether such low conservation reflects evolution
and the changing relations between proteins or merely the noisy
level of the datasets. When comparing yeast-2-hybrid data between
species the level of conservation is only slightly lower than when
comparing independently generated datasets from a single species,
indicating that indeed the low reproducibility of genomics data
might be the main cause for the low level of measured conservation
between species. In order to filter out the noise from such analyses
we have constructed a set of reliably co-regulated genes in S.cerevisiae
by combining co-expression data with transcription factor
binding data from ChIP-on-chip experiments. For those gene-pairs
for which we have multiple sources of evidence that they are indeed
truly co-regulated in S.cerevisiae , the conservation of
co-regulation in C.elegans is 78%. Co-regulation therefore
does appear well conserved in evolution (Snel et al., 2004). Such
analyses however only apply to cases where both co-regulated genes
are present in the species compared. Analyses of the phylogenetic
distribution of proteins from a single biomolecular system indicate
however suprisingly little “evolutionary modularity” of functional
modules (Snel et al, 2004). By mapping such variation in the makeup
of biomolecular systems on a phylogenetic tree one can actually
reconstruct the evolution of biomolecular systems, a specific example
of a large protein complex in eukaryotes will be discussed.
References:
-
van
Noort, V., Snel, B. and Huynen MA (2003) Predicting gene function
by conserved co-expression Trends Genet. 19: 238-242.
-
Huynen
MA, Snel B, van Noort V (2004) Comparative genomics for reliable
protein-function prediction from genomic data. Trends Genet.
20: 340-344.
-
Snel
B, van Noort V, Huynen MA. (2004) Gene co-regulation is highly
conserved in the evolution of eukaryotes and prokaryotes. Nucleic
Acids Res. 32: 4725-4731
-
Snel
B and Huynen MA (2004) Quantifying modularity in the evolution
of biomolecular systems. Genome Res. 14: 391-397

|
SystemsX - its relevance for science and innovation policy
Olaf Kübler (President of ETH Zurich)
Life sciences play
a crucial role in our society. Breakthroughs in fundamental research
are exploited at an increasingly higher rate and practically applied
in diagnostic, medical therapy and agro-food industry. It is imperative
that Switzerland avoids de-industrialization and re-organizes itself
in order to take on the challenges of the future and to make significant
efforts with the most important technologies and their applications
to life sciences, health care and nutrition.
Systems biology is a new discipline with a high potential for scientific
discoveries which provide new insights and understanding of biosystems.
Unlike molecular biology, systems biology does not exclusively examine
basic components, but rather the complex processes of a complete
biological system. For this holistic understanding systems biology
requires the support of various disciplines. It needs, for example,
information technology in order to record, manage and mine the enormous
amounts of data. Physics, engineering sciences, mathematics, chemistry,
and bioinformatics are further important disciplines
SystemsX's strategic vision of systems biology research is to contribute
substantially to the wealth of Switzerland's science, industry and
society. Present and future orientations of research will help to
create new ventures and industries for (bio)-technology, and future
health care and nutrition. Future directions will also open up new
fields of research and applications that are not yet foreseeable
from today's limited perspective.
SystemsX has been formed as a joint initiative of ETH Zurich and
the Universities of Basel and Zurich to establish an internationally
leading program in the emerging science of systems biology and to
provide the organizational and financial background to practice
systems biology at the participating institutions.
Based on a close collaboration between the individual disciplines
and with industry, it is envisaged to break up and dissolve the
boundaries between the disciplines, leading to true interdisciplinary
work and between basic and applied sciences, leading to transdisciplinary
research. New approaches will be fundamental to develop common language
and common research culture.
Another goal of SystemsX is to engage industry in Switzerland in
a long-term research effort and in financial support, based on the
achievements and significance of the initiative.

|
Proteomics strategies for pharmaceutical and diagnostic
research and for biomarker discovery
Hanno Langen ( F. Hoffmann-La Roche AG )
Proteomics
is a key technology for the discovery of biomarkers that are required
for pharmaceutical research and diagnostics. These markers can be
found by massive parallel investigation of biological samples, preferably
directly using diseased tissue. In order to obtain sufficient sensitivity,
multidimensional protein fractionation schemes have to be employed,
whereas statistical significance is achieved by the comparison of
large numbers of samples.
This
strategy imposes limitations on the employed technologies. Thus,
gel image comparison, as well as manual curation of mass spectrometric
identification results are not feasible for large scale biomarker
studies. We will show that meaningful data interpretation is possible
only with high accuracy in protein identification so that false
positive identifications will not obscure the true differences.
In our group we have developed alternative solutions to the problem
of protein quantification which employ data redundancies built into
the experimental design of the biomarker study.
Examples
of successful biomarker discovery including the methodology for
pre-validation and validation will be shown.

|
Coarse grained modeling of cellular and transduction networks
Felix Naef (ISREC & Swiss Institute of
Bioinformatics)
In
my presentation Iwill discuss two applications of physical modeling
to biological systems. In the first, we model populations of cellular
oscillators to interpret recordings of a luciferase reporter in
a circadian cell culture assay. Correlation with single cell data
illustrates the complimentary of both techniques. Our analysis uncovered
reciprocal interactions between the circadian and cell cycle oscillators,
manifest for example as a gating of mitosis time by the clock.
In the second part, I will discuss the study of information flow
in small size transduction networks, based on discrete dynamical
network models. Main concepts will be explained and selected examples
like the yeast cell-cycle and UVB response will be addressed in
some detail.
References:
- Nagoshi E, Saini C, Bauer C, Laroche T, Naef F,
Schibler U., Circadian gene expression in individual fibroblasts:
cell-autonomous and self-sustained oscillators pass time to daughter
cells, Cell. 2004 Nov 24;119(5):693-705.

|
Modelling the IGF signalling pathway.
Mark Penney (Novartis Pharma)
Brian Stoll,
Anna Georgieva, Gabriel Helmlinger, Birgit Schoeberl, Tad Stewart,
Ulrik Nielsen and Mark Penney
The IGF
network has been implicated in a number of cancers and therefore
IGF1 and IGF1R have become promising targets for therapeutic intervention.
In this work, a systems biology model of the IGF signalling pathway
was produced in collaboration with Merrimack Pharmaceuticals Inc
with the objective of identifying and quantifying biochemical biomarkers
for the pharmacodynamic efficacy of an IGFR-1 inhibitor with clinical
potential, and to further evaluate biomarkers for patient response.
The
IGF pathway topology was described using existing data in the literature
and expressed as a mathematical model in Matlab. It was quantified
by training it with measured in-house data in an iterative process
which demonstrated the importance of having a high quality data
set, in this case ones which described the peak activation of ERK
and AKT well. This resulted in a model which predicted the downstream
activation of ERK and AKT with a good degree of accuracy. (can you
split this sentence into two) The model was then validated by comparing
the predicted output in response to an IGF1-R inhibitor to an independently
measured experimental set.
Model
simulation and sensitivity analysis were used to determine those
biomarkers most sensitive to IGF1R blockade. These showed that the
IGF network lacks downstream signal amplification; consequently
the most sensitive biomarker is the phosphorylation of the IGF1R
itself, in contrast to similar pathways such as the EGF signalling
pathway. Model-based analysis also shows that normal expression
levels of IGF1R, levels of free IGF1 and IGF2, and IRS-1 expression
are the most important biomarkers for patient response. Furthermore,
it is shown that IGFBP-5 may also be important due to its ability
to amplify IGF signalling, illustrating that the regulation of free
IGF levels play a key role in IGF signalling prior to interaction
with IGF receptors.

|
A Plausible Model for the Digital Response of p53 to DNA
Damage: A Tale of Limiting Resources, Negative Feedback and Time
Delays.
John Jeremy Rice (IBM Computational Biology
Center, Yorktown Heights, NY)
J.
Jeremy Rice, Lan Ma, John Wagner, and Gustavo Stolovitzky. IBM Computational
Biology Center, Yorktown Heights, NY.
The
tumor suppressor p53 protein is critical to ensure genomic stability
when cells are under ionizing radiation (IR) stress. Recently it
was observed that single-cell response of p53 to IR is "digital",
in that it is number of oscillations (rather than the amplitude)
of p53 what shows dependence with the radiation dose. We present
a mathematical model of this phenomenon. In our model, double strand
break (DSB) sites induced by IR interact with a limiting pool of
DNA repair proteins, forming DSB-protein complexes at DNA damage
foci. Both the initial number of DSBs and the DNA repair process
are modeled taking into account the stochastic nature of the repair
process. The model assumes that the persisting complexes are sensed
by ataxia telangiectasia mutated (ATM), a kinase with a positive
feedback mechanism of autophosphorylation that sensitively transduces
the DNA damage information to downstream processes. The ATM sensing
module produces a step-like, ON-to-OFF signal as the input to a
downstream oscillator consisting of a p53-Mdm2 (Mdm2 is the negative
regulator of p53) autoregulatory feedback loop. Our simulation results
show that p53 and Mdm2 exhibit a coordinated oscillatory dynamics
upon IR stimulation, with a stochastic number of oscillations whose
mean increases with IR dosage, in good agreement with the observed
response of p53 to DNA-damage in single-cell experiments. We conjecture
that the robustnes of the oscillatory behavior of p53 is in part
induced by the ATM-induced autodegradation of MDM2, a mechanism
recently reported but not yet included in other models.

|
Combining Models
and Data for Systems Analysis of Cellular Networks
Jörg Stelling (ETH Zürich)
Systems
biology aims at understanding complex biological networks through
a combination of (comprehensive) experimental analysis and (quantitative)
mathematical modeling. At present, however, it is largely unclear,
which knowledge and data will be required for establishing realistic
mathematical models. Related to this, it is equally important to
ask to what extent the already available data allow for meaningful
model development.
In this
talk, I will argue that one can extract an unexpectedly high degree
of information by appropriate combinations of modeling approaches,
biological knowledge and only few experimental data. The examples
presented will include (i) structural analysis of metabolic networks
to infer key aspects of functionality and regulation, (ii) comparative
computational modeling of the TOR (‘target of rapamycin') pathway
to reveal signaling mechanisms, and (iii) detailed modeling of a
complex network in yeast cell cycle regulation. These studies point
to the robustness of cellular networks as an ‘enabling' feature
for model development, and they suggest strategies for efficiently
linking future experimental and theoretical approaches to cellular
networks.

|
Computational
Challenges in Integrating Transcriptomics, Proteomics and Metabolomics
Jim Samuelsson (GeneData)
In this
presentation we give an overview of some computational challenges
in the integrative analysis of transcriptomics, proteomics and metabolomics
data in order to obtain a better understanding of cellular processes,
a prerequisite for systems biology.
We start
by discussing some of the strengths and weaknesses pertaining to
the different 'omics' levels, calling for the need to be able to
work at more than one level. We continue with a few examples of
how we at Genedata tackle some of those challenges. For example,
how metabolic pathways can be studied in an integrated fashion together
with expression analysis data of different types.
We then
move on to some issues that we find especially important to address
in order to be able to make maximum use of the data, particularly
the necessity for quality assessment of the raw data as well as
the subsequent statistical analysis and data mining to extract the
biological information. Finally, we illustrate the methods by applying
them to the problem of biomarker identification.
|
Qualitative modelling, analysis and
simulation of genetic regulatory networks
Denis Thieffry ( Université de la
Méditerrannée-CNRS-INSERM, Marseille)
A proper
understanding of the mechanisms controlling gene expression requires
the integration of molecular and genetic data into full fledge mathematical
models. An overview of the main dynamical modelling approach will
be provided, before focusing on a multi-level, logical approach,
which enables a flexible qualitative modelling of complex regulatory
networks. This approach encompasses the development of a dedicated
software suite (GIN-sim), and will be illustrated by applications
to pattern formation and cell differentiation in the fly Drosophila
melanogaster .

|
Integration of biological
knowledge to deliver new biotherapeutics: From
data integration to system modeling
Ioannis Xenarios (Serono Pharmaceutical
Research Institute)
The
face of biological research in the biotech industry has evolved
at an alarming rate. From a one-gene/one protein analysis it has
borne witness to a multitude of new technologies that allow us to
capture and integrate a vast amount of information generated by
high throughput methods such as DNA microarrays, proteomics and
bioinformatic. Then along has come the sequenced human genome, and
suddenly we have a complete skeleton upon which to integrate the
mass of information generated. The scientific community now
has an integrated way of looking at what have previously been isolated
snippets of knowledge. We have known for some time the function(s)
of many proteins in signaling pathways, developmental regulation,
cell cycle progression, and so on. However, what is becoming clearer
as we gather more information and gaze upon the global picture,
is that a single protein rarely performs a single function.
Rather, the activity that we assign to it is the product of its
interaction with other proteins, small molecules or nucleic acids
at any given time. Despite the advance in high throughput technologies
(or, perhaps, because of this), we are faced with an avalanche of
data but only flakes of knowledge. What is needed is a system approach
that would enable us to integrate all the information generated
from these technology platforms and develop both mathematical and
biological methodologies to test them.
|
microRNAs Spread
to Viruses
Mihaela Zavolan (Biozentrum Basel & Swiss
Institute of Bioinformatics)
MicroRNAs
are a large class of endogenous RNA molecules approximately 22 nucleotides
in length that regulate translation of protein-coding genes in plants
and animals. Hundreds of miRNA genes have been identified in various
eukaryotic organisms and their number is still growing. Their existence
and function in “simpler” life forms such as viruses
have not been extensively investigated, although viruses are known
to use many elaborate RNA processing functions of the host. Using
a combination of computational miRNA gene prediction and small RNA
cloning, we discovered miRNAs encoded by herpes viruses. The predicted
viral targets and the expression profile of viral miRNAs suggest
a role of these miRNAs in the viral life cycle, and thus interaction
with the host.
|
|
|