[Basel Computational Biology Conference
2003] |
|
Abstracts |
|
A Z-score
based approach to high-throughput whole genome homology assignments
Wolfram Altenhofen (Chemical Computing
Group)
The use of global Z-scores as a basic statistical
tool in remote homology identification can be both more sensitive
and (in particular) more specific than the standard, heuristic methods
such as PSI-Blast or Fasta, but suffer due to their inherently high
computational cost. We present a Z-score based homology searching
methodology that exploits an automatically clustered database of
protein structures and sequences, and a generalized Fasta-like filter,
that can be practically and effectively used in whole genome annotation
projects. Annotation results on a number of full genomes are presented,
including specific comparisons to PSI-Blast and other tools.
back
|
|
Computational Bioscience @ IBM Research
Wanda Andreoni (IBM Research, Zurich Research
Laboratory)
This talk will touch on some of the numerous computational
activities in biology which are carried out throughout the IBM Research
Laboratories, from bioinformatics to simulations of protein folding
to computer chemistry for drug design.
back
|
|
About Moore’s
law in Proteomics
Peter Berndt (Roche)
Proteomics is the global analysis of
the changes in protein expression and chemical structure in biologically
relevant situations. Analysis of eukaryotic proteins consisting
of several thousands expressed proteins require substantial investment
and development in protein separation, as well as in automation
and information technologies.
In our talk, we will introduce the implementations of proteomics
technologies at the Roche RCMG facility. Currently we can carry
out complete, fully automated analysis of about 15.000 – 20.000
peptide mass fingerprint protein identifications per week. This
technological achievement allows for new types of experimental design
in proteomics that can alleviate the reliance on 2D PAGE technology
for basic proteomics tasks as relative protein quantification. We
will introduce highly sensitive approaches for analyzing mass spectrometric
data in proteomics that are able to elucidate information that goes
beyond simple protein identification. While proteomics data can
be generated at a tremendous speed, bioinformatics tools that would
allow to quickly annotate and to evaluate these changes in the context
of biochemical knowledge are still largely missing. We will describe
the attempts that we have made in our group to fill this gap.
In the final part of the talk, we will evaluate the future of proteomics.
It is now clear that the dynamic range of eukaryotic protein expression
requires much higher throughput rates and sensitivity than expected
in the initial proteomics euphoria of the late 90s. We will show
what technological challenges are to be met if proteomics is ever
to fulfill its scientific potential.
back
|
|
IBM Life Sciences strategy and solutions
Christopher Cooper (IBM)
Dramatic advances in the Life Science Industry are
changing the way we live. These advances fuel the rapid discoveries
in, genomics, proteomics and molecular biology that serve as the
basis of medical breakthroughs, the advent of personalised medicine
and the development of new drugs and treatments. Soon the typical
Life Science company will need to analyse "petrabytes"
of data to further their research efforts. In addition to the enormity
of data, their are challenges to querying their non standard formats,
assessing data assets across global networks and securing data outside
of firewalls. The competitive advantage belongs to companies that
can best use the information technology (IT) solutions to capitalise
on the opportunities presented by this transformation.
In response to these challenges, Life Science companies are redefining
their research methodologies and retooling their IT infrastructures
to position themselves for success in this new environment. The
traditional trial and error approach is rapidly giving way to more
predictive science based on sophisticated laboratory automation
and computer simulation.
Key issues include:
Sharing and pooling of information across global
resources while maintaining security
Retrieving and integrating diverse data across a variety of scientific
domains
Enabling continuous real time to data without building and managing
database warehouses
Developing new ways of collaborating among research teams using
shared research to focus efforts
Add to these challenges the need to work within existing
laboratory and business computing environments, and the challenges
facing todays Life Science Industry are almost overwhelming
back
|
|
Promoter genomics
Martin Ebeling (Roche)
The prediction of genetic elements for transcription
regulation has long been known as a very difficult task. Sophisticated
computational methods have been applied to it over the last two
decades.
With the availability of higher eukaryote genomes,
the field faces substantial changes. Large-scale data sets have
been made available, and genome comparisons can help to dramatically
improve prediction quality.
A brief introduction will be presented to on-going
work in the Roche Basel bioinformatics group.
back
|
|
Scope
and use of bioinformatics at the Swiss Institute of Bioinformatics
Ernest Feytmans (Director of the Swiss Institute
of Bioinformatics)
21st century Biology is not only based, as in
the past, on research in laboratories but has now become a science
of information, analysis and prediction, merging into Bioinformatics
- a single discipline at the crossroads of life sciences, informatics,
mathematics and information technology. Its aim is to help discover
new biological concepts and to offer a global perspective from which
novel and outstanding biological principles can be detected. This
can be obtained by:
- organising the data supplied by different
sequencing projects into well-annotated databases, and filing
the information relative to the sequences
- making sequencing data (genomes and proteomes)
and powerful analysis tools available to the biomedical community
- training scientists and students in the
biomedical sciences so that they can use these data appropriately
- training bioinformatic specialists whose
abilities are more and more sought-after.
The Swiss Institute of Bioinformatics (SIB)
or Institut Suisse de Bioinformatique (ISB) brings Swiss experts
in bioinformatics together and provides high quality services to
the national and international scientific community. Members of
the SIB include research groups in Geneva, Lausanne and Basel. The
SIB expertise is widely appreciated and its services are used worldwide
by researchers in cellular and molecular biology. The Institute
has three missions: research & development, education and service.
- It maintains databases of international standing
(Swiss-Prot, Prosite, EPD, Swiss-2Dpage, Human Chromosome 21,
TrEST, TrGen, AGBD, Hits, Swiss Model Repository, GermOnline).
- It supplies and develops services for the
biomedical research community worldwide by way of software and
services that can be accessed from the SIB web servers (ExPASy,
Melanie, T-COFFEE, PFTOOLS, ESTScan, Dotlet, SEView, Snp_detect,
Mmsearch, Swiss-Model, DeepView/Swiss-PdbViewer, MIMAS).
- It supplies services to the Swiss biomedical
research community within the framework of the international network
EMBnet and NCCR, the Swiss structural National Center of competence
in Research.
- It undertakes specific research and development
activities related to the databases and software developed within
the Institute.
- Together with the Universities of Lausanne,
Geneva and Basel, the Swiss Federal Institute of Technology (EPFL)
and a private partner (HP), the SIB is contributing to the creation
of a molecular bioinformatics service, backed by a high-performance
informatics platform (Vital-IT project)
back |
|
Facing the complexity
of Bioinformatics challenges. Hewlett-Packard view point
Dominique Gillot (Hewlett-Packard)
With the new directions taken by Genomics and
Proteomics research, and also due to the increased
Diversity, and the size of the data sets being used, the Information
Technology requirements are
Changing extremely rapidly. Hewlett-Packard has been working very
closely with the Bio research
Community for years, and is increasing the focus on Bioapplications,
from a partner support standpoint
And by optimising existing applications, but also by using internal
HP research capabilities to face this challenge. This joint work
is taking place at the systems design level but also by working
with industry researchers in order to review the systems architecture
and the tools which should deliver the best power to the applications,
in order to improve the productivity of research, but also to the
data gathering and retrieving.
Working with key partners like Oracle or Platform is an integral
part of HP strategy to provide the best research environment in
Bio-related applications.
back
|
|
Chemogenomics
Knowledge-Based Strategies in Drug Discovery Edgar
Jacobi (Novartis)
In the post-genomic age of drug discovery, targets
can no longer be viewed as singular objects having no relationship.
All targets are now visible and the systematic exploration of selected
target families appears a promising way to speed up and further
industrialize target-based drug discovery. Chemogenomics refers
to such systematic exploration of target families and aims to the
identification of all possible ligands of all target families. Because
biology works by applying prior knowledge ("what is known")
to an unknown entity, chemogenomics approaches are expected to be
especially effective within the previously well explored target
families, for which in addition to the protein sequence and structure
information considerable knowledge on pharmacologically active structural
classes and structure-activity relationships exist. For the new
target families, chemical knowledge will have to be generated and
beyond biological target validation, the challenge reverts to chemistry
to provide the molecules with which their novel biology and pharmacology
can be studied. With the discussion of examples from the previously
most successfully explored target families, especially the GPCR
family, we summarize herein our currently investigated chemogenomics
knowledge-based strategies for drug discovery, which are founded
on high integration of chem- and bioinformatics, providing hereby
a molecular informatics frame for the exploration of the new target
families.
back
|
|
The Future of GRIDS and E-Science
Dr Chris
Jones, CERN
The appearance of the concept of the GRID, “blueprint
for a new computing infrastructure”, has changed considerably
the direction of computing. The considerable and growing wave of
support and funding for GRID activities offers tremendous opportunities
for substantial paradigm changes and enhanced methods of working,
leading to “better science” or, for example, improved
processes of drug discovery. The evolution of the Web from a concept
in Tim Berners-Lee’s head to its full commodity deployment
took more than a decade. Whilst there are arguments to support the
view that wide-scale deployment of the GRID could go substantially
faster than that of the Web, it is nonetheless important to try
to analyse and understand the current state of this process.
The speaker began promoting GRID computing in 1999,
when indeed the DataGrids and Computational GRIDS proposed in that
era matched exactly the needs of CERN and its partners for their
new accelerator and experiments. More recently the thinking of how
to profit from GRIDS has broadened to encompass a wider vision.
One now sees the GRID as the technology that enables transparent
provision of a broad range of services and resources, which may
include extensive computing power and vast quantities of data, but
also many other forms of services, information or knowledge. The
vision foresees for example providing the decision maker, (scientist,
doctor, surgeon), with the very best information available in a
transparent fashion and thereby enhancing his work. As a result
many opportunities for benefit are being opened up in a range of
activities much wider than originally envisaged, including Health
and Pharma and many other arenas.
back
|
|
B-numerics
Andreas Kuttler (GIMS)
With the extensive use of gene and protein
databases and high-throughput screening, integrative approaches
for bio-medical research and the pharmaceutical industry have become
indispensable. This integration has to span at least the following
two axes. The first axis is the spatial dimension from sub-cellular
processes to organs and even larger systems. Transport, exchange
and reactions occur on all these levels and their quantitative formulation
is crucial for hypothesis testing and the exposure of models to
falsification.
The end point of the second axis is the roll out of a new drug,
its starting point the fundamental research that has led to its
discovery. Good models must serve as backbone for the entire process,
also including drug delivery and galenics. These models are rarely
designed from scratch but grow with the product and therefore incorporate
all drug related knowledge.
We present three examples - intestinal fluid mechanics, fatty acid
transport through the intestinal wall cell, and a physico-chemical
approach to lipid membranes - to demonstrate the power of bio-numerical
modelling (B-numerics). From this we lead over to our vision of
a computer-based workbench of drug development.
back
|
|
From genes to
whole organs: vertical integration using mathematical simulation of
the heart. Denis Noble (University Laboratory
of Physiology, Oxford)
Biological modelling of cells, organs and systems
has reached a very significant stage of development. Particularly
at the cellular level, there has been a long period of iteration
between simulation and experiment (Noble, 2002d). We have therefore
achieved the levels of detail and accuracy that are required for
the effective use of models in drug development. To be useful in
this way, biological models must reach down to the level of proteins
(receptors, transporters, enzymes etc), yet they must also reconstruct
functionality right up to the levels of organs and systems. This
is now possible and three important developments have made it so:
1. Relevant molecular and biophysical data on many
proteins and the genes that code for them is now available. This
is particularly true for ion transporters (Ashcroft, 2000)
2. The complexity of the biological processes that can now be modelled
is such that valuable counter-intuitive predictions are emerging
(Noble & Colatsky, 2000). Multiple target identification is
also possible.
3. Computer power has increased to meet the demands. Even very complex
cell models involving up to 100 different protein functions can
be run on single processor machines, while parallel computers are
now powerful enough to enable whole organ modelling to be achieved.
(Kohl et al 2000)
I will illustrate these points with reference to models
of the heart (Noble 2002a).
The criterion that models must reach down to the level
of proteins automatically guarantees that they will also reach down
to the level of gene mutations when these are reflected in identifiable
changes in protein function (Noble 2002b,c). Changes in expression
levels characteristic of disease states can also be represented.
These developments ensure that it will be possible to use simulation
as an essential aid to patient stratification. I will illustrate
these points with reference to sodium channel mutations.
Ashcroft, FM (2000) Ion Channels and Disease.
London: Academic Press.
Kohl P, Noble D, Winslow RL & Hunter P (2000) Computational
modelling of biological systems: tools and visions. Phil Trans Roy
Soc Lond A 358 579-610
Noble D (2002a) Modelling the heart: from genes to cells to the
whole organ. Science 295, 1678-1682
Noble D (2002b) Unravelling the genetics and mechanisms of cardiac
arrhythmia. Proc Natl Acad Sci USA 99, 5755-6
Noble D (2002c) The Rise of Computational Biology. Nature Reviews
Molecular Cell Biology, 3, 460-463
Noble D (2002d) Modelling the heart: insights, failures and progress.
BioEssays 24, 1155-1163
Noble D & Colatsky T J (2000) A return to rational drug discovery:
computer-based models of cells, organs and systems in drug target
identification. Emerging Therapeutic Targets, 4, 39-49.
back
|
|
Informatics
and Knowledge Management in Pharma Research Manuel
Peitsch (Novartis)
In this session, Dr. Peitsch will discuss his
experience at Novartis in charge of informatics and knowledge management.
Topics to be addressed include - what are the major challenges faced
by the pharmaceutical industry? How can they translate into informatics
and knowledge management objectives? What are the potential benefits
and performance indicators? How will we get there? These questions
will be discussed and insights into implementation will be given.
Lecture
as PDF-File (0.92MB)
back
|
|
Combining genome,transcriptome,
proteome and metabolome data to improve the drug discovery process
Othmar Pfannes (Genedata)
With the increasing automation and parallelization
of the drug discovery process, pharmaceutical companies focus on
reducing development costs and the time-to-market of drugs. Their
goals are, for example, the identification of potential drug safety
issues early in the discovery process, the fine-tuning of a drug's
mechanism-of action for improved drug efficacy, and the identification
and validation of novel drug targets as a means to generate new
drug leads. Ultimately, the intention is to revolutionize the drug
discovery process with a novel systems biology approach that analyses
complex clinical samples at a biological systems level which provides
new insights into the molecular mechanism within cells. This requires
sophisticated computational solutions.
The talk will provide an overview of Genedata's
computational systems and highlight a few successful applications
and collaborations with pharmaceutical companies.
back
|
|
Mining Meiosis
Michael Primig (Biozentrum, Universität
Basel & Swiss Institute of Bioinformatics)
Microarray-based expression profiling
studies and functional genomics experiments have produced information
about the transcriptional patterns and functions of many thousand
yeast and worm genes. Our lab works on a comprehensive approach to
study transcriptional control networks that govern meiotic development
in yeast and rodents using high density oligonucleotide microarrays.
Furthermore, we participate in the development of GermOnline, a novel
community-based approach to information management in biological
sciences.
Identifying meiotic and/or germ-cell specific transcripts in mammals
is a complex task because gonads contain a number of different cell
types, only a fraction of which are germ-cells. It is, however, possible
to obtain informative expression data using microarrays by comparing
purified testicular germ-cells at various stages of development to
somatic controls. The talk will summarize the microarray technology
and the outcome of recent profiling experiments using the rat model
system.
back |
|
Modelling
of the Human Transcriptome
Mischa Reinhardt (Novartis)
One of the major findings of the human genome project,
the prediction of not more than 30-40'000 genes appeared to be disappointing
on the first glance. It was felt that this number, only approximately
twice as high as the number of genes in Drosophila melanogaster,
would not appropriately reflect the complexity of a higher mammalian
system. On the other hand the complexity of an organism is not controlled
by the number of different genes, but by the number of different
proteins and their respective isoforms. One of the main mechanisms
that provides biochemical diversity is alternative splicing including
related mechanisms such as alternative transcriptional start-sites
or alternative poly-adenylation. In combination they are able to
create huge numbers of mRNA variants, originating from the same
transcriptional loci, but having different biochemical functions,
participating in different pathways and protein complexes or even
inhibiting the function of a different variant.
Expressed sequences such as ESTs represent one of the best tools
to predict transcript-variants of genes. In a large scale analysis
we utilized more than 9 million expressed human sequences in order
to predict expressed genes, model their genes-structures and alternative
mRNAs. We arrived at more than 180.000 alternative protein-coding
variants, which by post-translational modifications easily can be
expanded into a million individual proteins.
back
|
|
Modeling Genes and Genomes in
3D
Torsten Schwede (Biozentrum, Universität
Basel & Swiss Institute of Bioinformatics)
Three-dimensional protein structures
are key to comprehensively understand protein function at the atomic
level. With such knowledge researchers can design custom tailored
approaches to study for example disease related mutations or the
mode of action of specific inhibitors. Despite the tremendous progress
made in recent years in the field experimental structure determination,
it will never be possible to solve a structure for every single important
gene product. In fact, for the foreseeable future the number of experimentally
determined protein structures will remain about two orders of magnitude
smaller than the number of known protein sequences.
Therefore, computational methods such as comparative structure modeling and fold
recognition have gained much interest recently. Structural modeling complements
the experimental structure determination techniques and aims to provide enough
structural information to answer biological questions in those cases where no
experimental structure is available. Protein structure homology modeling (or
comparative modeling) is currently the most accurate of all structure prediction
methods. Comparative modeling uses the known three-dimensional structure of one
or more homologous proteins to predict the structure of a given protein sequence
that belongs to the same family.
Modeling of protein structures usually requires extensive expertise in structural
biology and the application of highly specialized computer programs. SWISS-MODEL
(http://swissmodel.expasy.org) is a server for comparative modeling of three-dimensional
protein structures. This software tool was developed as an automated modeling
expert system with a user-friendly interface. Making protein models readily available
is one of the great advantages of automated modeling. Today, SWISS-MODEL is the
most widely used free web-based modeling facility. In an ongoing large-scale
project, the SWISS MODEL pipeline is used to continuously update the SWISS-MODEL
Repository, a database containing 3-D models of all those sequences in SwissProt/TrEMBL
where the approach is feasible.
In the near future, the easy access to 3-dimensional structure information for
most proteins will change today's sequence centered view on proteins to a more
complete picture including functional details on a molecular level.
back
|
|
Docking
and De Novo Design: How Good are our Predictions? Martin
Stahl (Roche)
The early phases of commercial drug discovery programs are increasingly
guided by information extracted from three-dimensional structures
of the target proteins whose functions are to be modulated. Docking
and De Novo Design are two complementary techniques for selecting
small molecules that should bind to a protein binding site. Using
examples from research and method development projects at Roche,
the current status in these fields will be outlined and key issues
for further development will be discussed.
back
|
|
Text mining in Life
Science Informatics
Therese Vachon (Novartis Institutes for Biomedical
Research)
Most information is expressed as text, but, despite
advances in information retrieval and text mining systems, the wealth
of knowledge lying in large databases (and collections of heterogeneous,
multidisciplinary databases) remains largely untapped. This is true
both for retrieval (finding relevant information) and analysis (finding
relationships between separate pieces of information). Written language
expresses factual or qualitative information in a complex and opaque
manner and includes a lot of implied knowledge and viewpoints in
a sometimes challenging syntactic complexity.
Textual information is also an essential part of numerical
or factual databases, used in qualitative attributes describing
the subject or properties of data resources. These attributes are
needed non only for retrieval and analysis within a specific database
but also for data integration and data exchange. The lack of a common
representation scheme prevents the integration of multiple heterogeneous
textual and factual databases, the analysis of data coming from
separate heterogeneous sources, and a smooth navigation between
applications.
Difficulties associated with Text mining, Information
retrieval and analysis, terminology, knowledge representations will
be discussed and implementation of solutions in Life Sciences Informatics
developed and implemented at Novartis will be presented.
The major difficulties are associated with:
- Text mining: Because of the properties of
language and human communication, tools for the extraction of
meaningful objects and relationships between objects are difficult
to design. Morphosyntactic analysis techniques are now mature
and reliable, but the identification of specific objects (and
relationships between objects) in text remains challenging outside
of predefined case-frames. Semantic analysis techniques also rely
on local rules and tend to collapse when applied to broad domains.
Relying on human (intellectual) extraction is expensive and time-consuming
and results in a drastic degradation of the information content:
indexes are limited in scope, discipline-oriented, based on obsolescent,
source-specific, indexing schemes (thesauri) with crude relationships
between concepts. Moreover, they are frozen in time and cannot
reflect current knowledge and terminology. This is particularly
important in a Research environment where users are mostly interested
in emerging or deviant ideas rather than in overall trends or
historical data.
- Information retrieval and analysis: Despite
opposite claims, current information systems offer retrieval,
analysis and navigation tools which clearly fall short of the
expectations of users. Search engines match words or expressions
rather than ‘concepts’ and require a prior awareness
of the existence of what is being searched. Navigation tools are
usually crude and static. Ulix, arguably the most advanced of
such information retrieval systems, uses morphosyntactic analysis,
semantic networks, a broad specialized lexical knowledge base,
relevance ranking and extensive linking to improve both relevance
(match to a concept) and precision (relevant documents are ranked
first) and to provide intuitive navigation. Textual statistics
and exploration methods are available but are difficult to adapt
to scientific text and to a R&D environment.
- Terminology: Controlled vocabularies and
semantic networks (used for knowledge extraction) are usually
extremely broad and difficult to map with specialized vocabularies
and ontologies. The typing of associations between concepts is
poor. They tend to be static and poorly reactive to emerging science.
The mapping and update of such terminology depositories is difficult,
sometimes impossible.
- Knowledge representations: We currently have
no common representation scheme for describing data resources
(subjects and properties) and associations between data elements.
Together with controlled vocabularies, they are needed for building
bridges between databases belonging to different disciplines,
and for allowing data analysis, navigation and exploration. The
model must provide total independence between the knowledge layer
and the data resources. The central part of the scheme are metadata,
focused on critical business area, tolerant of organizational
/ scientific changes, and stable with time. The model must also
be versatile enough to accommodate dynamic and customizable navigation
tools (knowledge Maps)
Current projects in text mining are articulated
around the four major areas identified above, which are dependent
of each other and must be addressed at the same time. The goals
are:
- To provide generic solutions for text mining,
lexical extraction, and knowledge extraction, either by in-house
development or by acquisition of state-of-the art technologies
- To pursue the development of advanced textual
information retrieval systems
- To provide structured controlled vocabularies
and vocabulary stores, used for validation, indexing, retrieval,
navigation and data analysis, and tools for mapping terminology
- To develop consistent knowledge representation
models, available to all other applications in the Knowledge Space,
whether numerical or textual
- To develop reusable components for applications
within the perimeter of Knowledge Engineering, e.g., text retrieval,
vocabularies, validation, query interpretation, textual statistics,
etc.
- To develop specific and well focused applications
with immediate business benefits, using generic KE tools.
Lecture
as PDF-File (3.25MB)
back |
|
top |
|