[Basel Computational Biology Conference 2003]

  Abstracts
 
   
A Z-score based approach to high-throughput whole genome homology assignments

Wolfram Altenhofen (Chemical Computing Group)

The use of global Z-scores as a basic statistical tool in remote homology identification can be both more sensitive and (in particular) more specific than the standard, heuristic methods such as PSI-Blast or Fasta, but suffer due to their inherently high computational cost. We present a Z-score based homology searching methodology that exploits an automatically clustered database of protein structures and sequences, and a generalized Fasta-like filter, that can be practically and effectively used in whole genome annotation projects. Annotation results on a number of full genomes are presented, including specific comparisons to PSI-Blast and other tools.
 
back

 

 
Computational Bioscience @ IBM Research


Wanda Andreoni (IBM Research, Zurich Research Laboratory)

This talk will touch on some of the numerous computational activities in biology which are carried out throughout the IBM Research Laboratories, from bioinformatics to simulations of protein folding to computer chemistry for drug design.

back

   
About Moore’s law in Proteomics

Peter Berndt (Roche)

Proteomics is the global analysis of the changes in protein expression and chemical structure in biologically relevant situations. Analysis of eukaryotic proteins consisting of several thousands expressed proteins require substantial investment and development in protein separation, as well as in automation and information technologies.
In our talk, we will introduce the implementations of proteomics technologies at the Roche RCMG facility. Currently we can carry out complete, fully automated analysis of about 15.000 – 20.000 peptide mass fingerprint protein identifications per week. This technological achievement allows for new types of experimental design in proteomics that can alleviate the reliance on 2D PAGE technology for basic proteomics tasks as relative protein quantification. We will introduce highly sensitive approaches for analyzing mass spectrometric data in proteomics that are able to elucidate information that goes beyond simple protein identification. While proteomics data can be generated at a tremendous speed, bioinformatics tools that would allow to quickly annotate and to evaluate these changes in the context of biochemical knowledge are still largely missing. We will describe the attempts that we have made in our group to fill this gap.
In the final part of the talk, we will evaluate the future of proteomics. It is now clear that the dynamic range of eukaryotic protein expression requires much higher throughput rates and sensitivity than expected in the initial proteomics euphoria of the late 90s. We will show what technological challenges are to be met if proteomics is ever to fulfill its scientific potential.

 
back

 

 
IBM Life Sciences strategy and solutions

Christopher Cooper (IBM)

Dramatic advances in the Life Science Industry are changing the way we live. These advances fuel the rapid discoveries in, genomics, proteomics and molecular biology that serve as the basis of medical breakthroughs, the advent of personalised medicine and the development of new drugs and treatments. Soon the typical Life Science company will need to analyse "petrabytes" of data to further their research efforts. In addition to the enormity of data, their are challenges to querying their non standard formats, assessing data assets across global networks and securing data outside of firewalls. The competitive advantage belongs to companies that can best use the information technology (IT) solutions to capitalise on the opportunities presented by this transformation.
In response to these challenges, Life Science companies are redefining their research methodologies and retooling their IT infrastructures to position themselves for success in this new environment. The traditional trial and error approach is rapidly giving way to more predictive science based on sophisticated laboratory automation and computer simulation.
Key issues include:

Sharing and pooling of information across global resources while maintaining security

Retrieving and integrating diverse data across a variety of scientific domains

Enabling continuous real time to data without building and managing database warehouses

Developing new ways of collaborating among research teams using shared research to focus efforts

Add to these challenges the need to work within existing laboratory and business computing environments, and the challenges facing todays Life Science Industry are almost overwhelming

back

   
Promoter genomics

Martin Ebeling (Roche)

The prediction of genetic elements for transcription regulation has long been known as a very difficult task. Sophisticated computational methods have been applied to it over the last two decades.

With the availability of higher eukaryote genomes, the field faces substantial changes. Large-scale data sets have been made available, and genome comparisons can help to dramatically improve prediction quality.

A brief introduction will be presented to on-going work in the Roche Basel bioinformatics group.
 
back

 

 
Scope and use of bioinformatics at the Swiss Institute of Bioinformatics

Ernest Feytmans (Director of the Swiss Institute of Bioinformatics)

21st century Biology is not only based, as in the past, on research in laboratories but has now become a science of information, analysis and prediction, merging into Bioinformatics - a single discipline at the crossroads of life sciences, informatics, mathematics and information technology. Its aim is to help discover new biological concepts and to offer a global perspective from which novel and outstanding biological principles can be detected. This can be obtained by:

  • organising the data supplied by different sequencing projects into well-annotated databases, and filing the information relative to the sequences
  • making sequencing data (genomes and proteomes) and powerful analysis tools available to the biomedical community
  • training scientists and students in the biomedical sciences so that they can use these data appropriately
  • training bioinformatic specialists whose abilities are more and more sought-after.

The Swiss Institute of Bioinformatics (SIB) or Institut Suisse de Bioinformatique (ISB) brings Swiss experts in bioinformatics together and provides high quality services to the national and international scientific community. Members of the SIB include research groups in Geneva, Lausanne and Basel. The SIB expertise is widely appreciated and its services are used worldwide by researchers in cellular and molecular biology. The Institute has three missions: research & development, education and service.

  • It maintains databases of international standing (Swiss-Prot, Prosite, EPD, Swiss-2Dpage, Human Chromosome 21, TrEST, TrGen, AGBD, Hits, Swiss Model Repository, GermOnline).
  • It supplies and develops services for the biomedical research community worldwide by way of software and services that can be accessed from the SIB web servers (ExPASy, Melanie, T-COFFEE, PFTOOLS, ESTScan, Dotlet, SEView, Snp_detect, Mmsearch, Swiss-Model, DeepView/Swiss-PdbViewer, MIMAS).
  • It supplies services to the Swiss biomedical research community within the framework of the international network EMBnet and NCCR, the Swiss structural National Center of competence in Research.
  • It undertakes specific research and development activities related to the databases and software developed within the Institute.
  • Together with the Universities of Lausanne, Geneva and Basel, the Swiss Federal Institute of Technology (EPFL) and a private partner (HP), the SIB is contributing to the creation of a molecular bioinformatics service, backed by a high-performance informatics platform (Vital-IT project)
     

back

 


Facing the complexity of Bioinformatics challenges. Hewlett-Packard view point

Dominique Gillot (Hewlett-Packard)

With the new directions taken by Genomics and Proteomics research, and also due to the increased
Diversity, and the size of the data sets being used, the Information Technology requirements are
Changing extremely rapidly. Hewlett-Packard has been working very closely with the Bio research
Community for years, and is increasing the focus on Bioapplications, from a partner support standpoint
And by optimising existing applications, but also by using internal HP research capabilities to face this challenge. This joint work is taking place at the systems design level but also by working with industry researchers in order to review the systems architecture and the tools which should deliver the best power to the applications, in order to improve the productivity of research, but also to the data gathering and retrieving.
Working with key partners like Oracle or Platform is an integral part of HP strategy to provide the best research environment in Bio-related applications.

 

back
   
Chemogenomics Knowledge-Based Strategies in Drug Discovery

Edgar Jacobi (Novartis)

In the post-genomic age of drug discovery, targets can no longer be viewed as singular objects having no relationship. All targets are now visible and the systematic exploration of selected target families appears a promising way to speed up and further industrialize target-based drug discovery. Chemogenomics refers to such systematic exploration of target families and aims to the identification of all possible ligands of all target families. Because biology works by applying prior knowledge ("what is known") to an unknown entity, chemogenomics approaches are expected to be especially effective within the previously well explored target families, for which in addition to the protein sequence and structure information considerable knowledge on pharmacologically active structural classes and structure-activity relationships exist. For the new target families, chemical knowledge will have to be generated and beyond biological target validation, the challenge reverts to chemistry to provide the molecules with which their novel biology and pharmacology can be studied. With the discussion of examples from the previously most successfully explored target families, especially the GPCR family, we summarize herein our currently investigated chemogenomics knowledge-based strategies for drug discovery, which are founded on high integration of chem- and bioinformatics, providing hereby a molecular informatics frame for the exploration of the new target families.
 
back

   
The Future of GRIDS and E-Science

Dr Chris Jones, CERN

The appearance of the concept of the GRID, “blueprint for a new computing infrastructure”, has changed considerably the direction of computing. The considerable and growing wave of support and funding for GRID activities offers tremendous opportunities for substantial paradigm changes and enhanced methods of working, leading to “better science” or, for example, improved processes of drug discovery. The evolution of the Web from a concept in Tim Berners-Lee’s head to its full commodity deployment took more than a decade. Whilst there are arguments to support the view that wide-scale deployment of the GRID could go substantially faster than that of the Web, it is nonetheless important to try to analyse and understand the current state of this process.

The speaker began promoting GRID computing in 1999, when indeed the DataGrids and Computational GRIDS proposed in that era matched exactly the needs of CERN and its partners for their new accelerator and experiments. More recently the thinking of how to profit from GRIDS has broadened to encompass a wider vision. One now sees the GRID as the technology that enables transparent provision of a broad range of services and resources, which may include extensive computing power and vast quantities of data, but also many other forms of services, information or knowledge. The vision foresees for example providing the decision maker, (scientist, doctor, surgeon), with the very best information available in a transparent fashion and thereby enhancing his work. As a result many opportunities for benefit are being opened up in a range of activities much wider than originally envisaged, including Health and Pharma and many other arenas.

back

 

 
B-numerics

Andreas Kuttler (GIMS)

With the extensive use of gene and protein databases and high-throughput screening, integrative approaches for bio-medical research and the pharmaceutical industry have become indispensable. This integration has to span at least the following two axes. The first axis is the spatial dimension from sub-cellular processes to organs and even larger systems. Transport, exchange and reactions occur on all these levels and their quantitative formulation is crucial for hypothesis testing and the exposure of models to falsification.
The end point of the second axis is the roll out of a new drug, its starting point the fundamental research that has led to its discovery. Good models must serve as backbone for the entire process, also including drug delivery and galenics. These models are rarely designed from scratch but grow with the product and therefore incorporate all drug related knowledge.
We present three examples - intestinal fluid mechanics, fatty acid transport through the intestinal wall cell, and a physico-chemical approach to lipid membranes - to demonstrate the power of bio-numerical modelling (B-numerics). From this we lead over to our vision of a computer-based workbench of drug development.

 
back

 
From genes to whole organs: vertical integration using mathematical simulation of the heart.

Denis Noble (University Laboratory of Physiology, Oxford)

Biological modelling of cells, organs and systems has reached a very significant stage of development. Particularly at the cellular level, there has been a long period of iteration between simulation and experiment (Noble, 2002d). We have therefore achieved the levels of detail and accuracy that are required for the effective use of models in drug development. To be useful in this way, biological models must reach down to the level of proteins (receptors, transporters, enzymes etc), yet they must also reconstruct functionality right up to the levels of organs and systems. This is now possible and three important developments have made it so:

1. Relevant molecular and biophysical data on many proteins and the genes that code for them is now available. This is particularly true for ion transporters (Ashcroft, 2000)
2. The complexity of the biological processes that can now be modelled is such that valuable counter-intuitive predictions are emerging (Noble & Colatsky, 2000). Multiple target identification is also possible.
3. Computer power has increased to meet the demands. Even very complex cell models involving up to 100 different protein functions can be run on single processor machines, while parallel computers are now powerful enough to enable whole organ modelling to be achieved. (Kohl et al 2000)

I will illustrate these points with reference to models of the heart (Noble 2002a).

The criterion that models must reach down to the level of proteins automatically guarantees that they will also reach down to the level of gene mutations when these are reflected in identifiable changes in protein function (Noble 2002b,c). Changes in expression levels characteristic of disease states can also be represented. These developments ensure that it will be possible to use simulation as an essential aid to patient stratification. I will illustrate these points with reference to sodium channel mutations.

Ashcroft, FM (2000) Ion Channels and Disease. London: Academic Press.
Kohl P, Noble D, Winslow RL & Hunter P (2000) Computational modelling of biological systems: tools and visions. Phil Trans Roy Soc Lond A 358 579-610
Noble D (2002a) Modelling the heart: from genes to cells to the whole organ. Science 295, 1678-1682
Noble D (2002b) Unravelling the genetics and mechanisms of cardiac arrhythmia. Proc Natl Acad Sci USA 99, 5755-6
Noble D (2002c) The Rise of Computational Biology. Nature Reviews Molecular Cell Biology, 3, 460-463
Noble D (2002d) Modelling the heart: insights, failures and progress. BioEssays 24, 1155-1163
Noble D & Colatsky T J (2000) A return to rational drug discovery: computer-based models of cells, organs and systems in drug target identification. Emerging Therapeutic Targets, 4, 39-49.

 
back

   
Informatics and Knowledge Management in Pharma Research

Manuel Peitsch (Novartis)

In this session, Dr. Peitsch will discuss his experience at Novartis in charge of informatics and knowledge management. Topics to be addressed include - what are the major challenges faced by the pharmaceutical industry? How can they translate into informatics and knowledge management objectives? What are the potential benefits and performance indicators? How will we get there? These questions will be discussed and insights into implementation will be given.
 
Lecture as PDF-File (0.92MB)

back

   
Combining genome,transcriptome, proteome and metabolome data to improve the drug discovery process

Othmar Pfannes (Genedata)

With the increasing automation and parallelization of the drug discovery process, pharmaceutical companies focus on reducing development costs and the time-to-market of drugs. Their goals are, for example, the identification of potential drug safety issues early in the discovery process, the fine-tuning of a drug's mechanism-of action for improved drug efficacy, and the identification and validation of novel drug targets as a means to generate new drug leads. Ultimately, the intention is to revolutionize the drug discovery process with a novel systems biology approach that analyses complex clinical samples at a biological systems level which provides new insights into the molecular mechanism within cells. This requires sophisticated computational solutions.

The talk will provide an overview of Genedata's computational systems and highlight a few successful applications and collaborations with pharmaceutical companies.
 
back

 

 
Mining Meiosis

Michael Primig (Biozentrum, Universität Basel & Swiss Institute of Bioinformatics)

Microarray-based expression profiling studies and functional genomics experiments have produced information about the transcriptional patterns and functions of many thousand yeast and worm genes. Our lab works on a comprehensive approach to study transcriptional control networks that govern meiotic development in yeast and rodents using high density oligonucleotide microarrays. Furthermore, we participate in the development of GermOnline, a novel community-based approach to information management in biological sciences.
Identifying meiotic and/or germ-cell specific transcripts in mammals is a complex task because gonads contain a number of different cell types, only a fraction of which are germ-cells. It is, however, possible to obtain informative expression data using microarrays by comparing purified testicular germ-cells at various stages of development to somatic controls. The talk will summarize the microarray technology and the outcome of recent profiling experiments using the rat model system.

back

 

 
Modelling of the Human Transcriptome

Mischa Reinhardt (Novartis)

One of the major findings of the human genome project, the prediction of not more than 30-40'000 genes appeared to be disappointing on the first glance. It was felt that this number, only approximately twice as high as the number of genes in Drosophila melanogaster, would not appropriately reflect the complexity of a higher mammalian system. On the other hand the complexity of an organism is not controlled by the number of different genes, but by the number of different proteins and their respective isoforms. One of the main mechanisms that provides biochemical diversity is alternative splicing including related mechanisms such as alternative transcriptional start-sites or alternative poly-adenylation. In combination they are able to create huge numbers of mRNA variants, originating from the same transcriptional loci, but having different biochemical functions, participating in different pathways and protein complexes or even inhibiting the function of a different variant.
Expressed sequences such as ESTs represent one of the best tools to predict transcript-variants of genes. In a large scale analysis we utilized more than 9 million expressed human sequences in order to predict expressed genes, model their genes-structures and alternative mRNAs. We arrived at more than 180.000 alternative protein-coding variants, which by post-translational modifications easily can be expanded into a million individual proteins.
 
 
back

 

 
Modeling Genes and Genomes in 3D

Torsten Schwede (Biozentrum, Universität Basel & Swiss Institute of Bioinformatics)

Three-dimensional protein structures are key to comprehensively understand protein function at the atomic level. With such knowledge researchers can design custom tailored approaches to study for example disease related mutations or the mode of action of specific inhibitors. Despite the tremendous progress made in recent years in the field experimental structure determination, it will never be possible to solve a structure for every single important gene product. In fact, for the foreseeable future the number of experimentally determined protein structures will remain about two orders of magnitude smaller than the number of known protein sequences.

Therefore, computational methods such as comparative structure modeling and fold recognition have gained much interest recently. Structural modeling complements the experimental structure determination techniques and aims to provide enough structural information to answer biological questions in those cases where no experimental structure is available. Protein structure homology modeling (or comparative modeling) is currently the most accurate of all structure prediction methods. Comparative modeling uses the known three-dimensional structure of one or more homologous proteins to predict the structure of a given protein sequence that belongs to the same family.

Modeling of protein structures usually requires extensive expertise in structural biology and the application of highly specialized computer programs. SWISS-MODEL (http://swissmodel.expasy.org) is a server for comparative modeling of three-dimensional protein structures. This software tool was developed as an automated modeling expert system with a user-friendly interface. Making protein models readily available is one of the great advantages of automated modeling. Today, SWISS-MODEL is the most widely used free web-based modeling facility. In an ongoing large-scale project, the SWISS MODEL pipeline is used to continuously update the SWISS-MODEL Repository, a database containing 3-D models of all those sequences in SwissProt/TrEMBL where the approach is feasible.

In the near future, the easy access to 3-dimensional structure information for most proteins will change today's sequence centered view on proteins to a more complete picture including functional details on a molecular level.

back

   
Docking and De Novo Design: How Good are our Predictions?

Martin Stahl (Roche)

The early phases of commercial drug discovery programs are increasingly guided by information extracted from three-dimensional structures of the target proteins whose functions are to be modulated. Docking and De Novo Design are two complementary techniques for selecting small molecules that should bind to a protein binding site. Using examples from research and method development projects at Roche, the current status in these fields will be outlined and key issues for further development will be discussed.

 
back

 


Text mining in Life Science Informatics

Therese Vachon (Novartis Institutes for Biomedical Research)

Most information is expressed as text, but, despite advances in information retrieval and text mining systems, the wealth of knowledge lying in large databases (and collections of heterogeneous, multidisciplinary databases) remains largely untapped. This is true both for retrieval (finding relevant information) and analysis (finding relationships between separate pieces of information). Written language expresses factual or qualitative information in a complex and opaque manner and includes a lot of implied knowledge and viewpoints in a sometimes challenging syntactic complexity.

Textual information is also an essential part of numerical or factual databases, used in qualitative attributes describing the subject or properties of data resources. These attributes are needed non only for retrieval and analysis within a specific database but also for data integration and data exchange. The lack of a common representation scheme prevents the integration of multiple heterogeneous textual and factual databases, the analysis of data coming from separate heterogeneous sources, and a smooth navigation between applications.

Difficulties associated with Text mining, Information retrieval and analysis, terminology, knowledge representations will be discussed and implementation of solutions in Life Sciences Informatics developed and implemented at Novartis will be presented.

The major difficulties are associated with:

  • Text mining: Because of the properties of language and human communication, tools for the extraction of meaningful objects and relationships between objects are difficult to design. Morphosyntactic analysis techniques are now mature and reliable, but the identification of specific objects (and relationships between objects) in text remains challenging outside of predefined case-frames. Semantic analysis techniques also rely on local rules and tend to collapse when applied to broad domains. Relying on human (intellectual) extraction is expensive and time-consuming and results in a drastic degradation of the information content: indexes are limited in scope, discipline-oriented, based on obsolescent, source-specific, indexing schemes (thesauri) with crude relationships between concepts. Moreover, they are frozen in time and cannot reflect current knowledge and terminology. This is particularly important in a Research environment where users are mostly interested in emerging or deviant ideas rather than in overall trends or historical data.
  • Information retrieval and analysis: Despite opposite claims, current information systems offer retrieval, analysis and navigation tools which clearly fall short of the expectations of users. Search engines match words or expressions rather than ‘concepts’ and require a prior awareness of the existence of what is being searched. Navigation tools are usually crude and static. Ulix, arguably the most advanced of such information retrieval systems, uses morphosyntactic analysis, semantic networks, a broad specialized lexical knowledge base, relevance ranking and extensive linking to improve both relevance (match to a concept) and precision (relevant documents are ranked first) and to provide intuitive navigation. Textual statistics and exploration methods are available but are difficult to adapt to scientific text and to a R&D environment.
  • Terminology: Controlled vocabularies and semantic networks (used for knowledge extraction) are usually extremely broad and difficult to map with specialized vocabularies and ontologies. The typing of associations between concepts is poor. They tend to be static and poorly reactive to emerging science. The mapping and update of such terminology depositories is difficult, sometimes impossible.
  • Knowledge representations: We currently have no common representation scheme for describing data resources (subjects and properties) and associations between data elements. Together with controlled vocabularies, they are needed for building bridges between databases belonging to different disciplines, and for allowing data analysis, navigation and exploration. The model must provide total independence between the knowledge layer and the data resources. The central part of the scheme are metadata, focused on critical business area, tolerant of organizational / scientific changes, and stable with time. The model must also be versatile enough to accommodate dynamic and customizable navigation tools (knowledge Maps)

Current projects in text mining are articulated around the four major areas identified above, which are dependent of each other and must be addressed at the same time. The goals are:

  • To provide generic solutions for text mining, lexical extraction, and knowledge extraction, either by in-house development or by acquisition of state-of-the art technologies
  • To pursue the development of advanced textual information retrieval systems
  • To provide structured controlled vocabularies and vocabulary stores, used for validation, indexing, retrieval, navigation and data analysis, and tools for mapping terminology
  • To develop consistent knowledge representation models, available to all other applications in the Knowledge Space, whether numerical or textual
  • To develop reusable components for applications within the perimeter of Knowledge Engineering, e.g., text retrieval, vocabularies, validation, query interpretation, textual statistics, etc.
  • To develop specific and well focused applications with immediate business benefits, using generic KE tools.

Lecture as PDF-File (3.25MB)

back

 
top
Biozentrum, University of Basel