[BC]2 Speakers and Abstracts
Human Genome Variation and Personalized Medicine
Stylianos Antonarakis
University of Geneva, CH
A large number of human disorders (including cancer) and other phenotypic traits are caused by or are associated with germline of somatic genomic alterations. The current goal of genetic medicine is to perform the matchmaking between the genomic variability and the phenotypic variability. The completion of the sequence of the human genome, and that of the genomes of other species provided unprecedented opportunities to determine the functional elements and the functional variability of these genomes. The elucidation of the cause of monogenic disorders, was a great success of the past decade, and will certainly continue in the next several years not only to provide precise diagnostic tools, but also to understand the molecular pathophysiology. The main challenge ahead, however, is the discovery of nucleotide variability that confers (positive or negative) susceptibility to complex, common phenotypes. To put it in simple terms: for each genomic variant to differentiate between "neutral" versus "functional" or "pathogenic" variation. The completion of recent genomewide association studies for numerous phenotypes, made it clear that common variation in the genome only accounts for a small fraction of the genetic etiology of common, multifactorial diseases. The reading of individual genomes will provide an enormous challenge in the discovery of causative genomic variants that could be used for diagnosis prevention, or treatment. Comparative genomic analysis between species and between individuals, knowledge of the polymorphic structure of the genomes of different human populations, introduction of new tools to assess gene function, transcriptome analysis, rapid, inexpensive, and accurate DNA re-sequencing of genomes, and assessment of the quantitative variability of gene expression, are all necessary requirements to meet this enormous challenge of genomic and epigenetic pathology. In addition, the remarkable similarity of functional genomic elements in mammalian and other species, provides further opportunities of animal experimentation for disease allele identification. In turn, functional analysis of the genome, and characterization of the functional variability are likely to provide new therapeutic opportunities.
Challenges of Rapid Population Growth to Modeling Human Genetic Variation
Andrew Clark
Cornell University, Ithaca, NY, USA
The human global population has expanded more than 1000-fold in the last 400 generations, resulting in a pattern of genetic variation that is profoundly out of equilibrium. This recent growth produces a large excess of rare variation, which has important consequences for finding genes that underlie complex disease risk. We are exploring methods of population genetic analysis to understand the role of rapid population expansion in shaping patterns of genetic variation. From a purely theoretical standpoint, explosive growth impacts patterns of genetic variation by distorting the genealogy that connects genetic lineages. Large samples can result in multiple mergers and rapid growth further alters branch lengths producing distorted gene genealogies that make standard coalescent theory invalid. It is still possible to produce gene genealogies that produce appropriate sample site frequency spectra under models with both rapid growth and large samples. Once we have these genealogies, it is possible to make accurate population genetic inference. These approaches rely on simulations to demonstrate the need to accommodate the unusually rapid and recent growth, and show that the problem largely concerns rare variation and inferences of recent growth. These results also have bearing on models of complex trait variation, as a great influx of rare mutations in a complex system may erode the function of the system in unexpected ways. Natural selection combats these processes, but the efficiency of natural selection can be quite low on exceedingly rare variation. By learning how rapid growth has impacted genetic variation in humans, we hope to obtain a more accurate picture of the expected genetic architecture of disease risk, which will in turn guide methods for improved association testing.
Genetic Variability and the Proteome
Ruedi Aebersold
Institute of Molecular Systems Biology, ETH Zürich, and Faculty of Science, University of Zürich, Switzerland
The question how genetic variability is translated into phenotypes is fundamental in biology and medicine. Powerful genomic technologies now determine genetic variability at a genomic level and at unprecedented speed, accuracy and (low) cost. To date the effects of genomic variability on the expressed information of the cell has been mainly studied by transcript profiling.
In this presentation we will discuss emerging computational and quantitative proteomic technologies to relate genotypic variation to the proteome. Proteomic data to support such correlations need to be quantitatively accurate, highly reproducible across multiple measurements and samples and generable at high throughput. Data with these qualities can now be generated by the targeted proteomic methods selected reaction monitoring (SRM) and, at higher throughput, by SWATH-MS.
We will discuss the principles of these mass spectrometric methods, discuss the computational challenged they pose for data analysis and demonstrate with selected applications their ability to determine the effect of genetic variability on the proteome.
Genome-Wide Association Study of Metabolic Traits Reveals Novel Gene-Metabolite-Disease Links
Sven Bergmann
Department of Medical Genetics, Université de Lausanne & SIB, Switzerland
Metabolomic traits are important molecular phenotypes, which may play an important role in determining clinical phenotypes and disease progression. I will discuss recent results from a metabolome- and genome-wide association study on 1H-NMR metabolomic urine profiles. From our discovery cohort of 835 Caucasian individuals who participated in the CoLaus study, we identified 139 suggestively significant (P<5x10-8) single nucleotide polymorphism (SNP)-metabolomic feature associations. Out of these, 56 replicated in the TasteSensomics cohort comprising 601 individuals from São Paulo of vastly different genetic background. Our key novel findings are the association of two SNPs with NMR spectral signatures pointing to fucose (P=6.9x10-44) and lysine (P=1.2x10-33), respectively. Fine-mapping of the first locus pinpointed to a gene, which has previously been associated with Crohn’s disease. This implicates fucose as a potential prognostic disease marker, for which there is already published evidence from a mouse model. The second SNP lies within a gene for which rare mutations have been linked to severe kidney damage. Integrating phenotypic CoLaus traits, we provide further evidence pointing to a possible causative role of lysine for chronic kidney disease. The replication of previous associations and our new discoveries demonstrate the potential of untargeted metabolomics GWAS to robustly identify molecular disease markers.
Clinical Bioinformatics: a Paradigm Change in Medicine
Jacques S Beckmann
SIB Swiss Institute of Bioinformatics, Lausanne, CH
Medicine is increasingly and rapidly turning to be a data-driven as well as observational science. This tendency is driven, among others, by the increasing availability of DNA and RNA sequencing and other ‘omics type of data. Patients are also likely to play an immense role in this evolution, which is bound to change medical practice in a manner unforeseen, this far. This trend poses new challenges to both the clinical and bioinformatics communities, creating the necessity for a bridging culture.
Trans-eQTL Mapping in over 8,000 Samples Reveals Genetic Variants that Define Hallmarks of Disease
Lude Franke
Department of Genetics, University Medical Centre Groningen, Groningen, the Netherlands
Identifying the downstream effects of disease-associated single nucleotide polymorphisms (SNPs) is challenging: the causal gene is often unknown or it is unclear how the SNP affects the causal gene, making it difficult to design experiments that identify subsequent downstream effects. To overcome this, we performed the largest eQTL meta-analysis in non-transformed peripheral blood samples of 5,311 individuals, with replication in 2,775 individuals. This identified trans-eQTLs for 346 trait-associated SNPs. Although we did not study specific patient cohorts, some disease-associated SNPs affect multiple trans-genes that are known to be markedly altered in patients: for example, systemic lupus erythematosus (SLE) SNP rs4917014 altered C1QB and numerous type 1 interferon response genes, both hallmarks of SLE. Subsequent ChIP-seq data analysis on these trans-genes implicated transcription factor IKZF1 as the causal gene at this locus, with DeepSAGE RNA-sequencing revealing that rs4917014 strongly alters 3' UTR levels of IKZF1. Variants associated with cholesterol metabolism and type 1 diabetes showed similar phenomena, indicating that large-scale eQTL mapping provides insight into the downstream effects of many trait-associated variants.
Novel method identified SNPs in the imprinted gene KCNK9 exhibiting parent-of origin effect on BMI
Zoltan Kutalik
Department of Medical Genetics, Université de Lausanne & SIB, Switzerland
It has been hypothesized that some genetic variants exert different effects on certain phenotypes depending on parental origin. Such markers exhibit parent-of-origin effect, principally due to the genomic imprinting phenomenon, caused by epigenetic factors such as methylation and histone-modification. Only a few linkage studies explored whether alleles may have different effects on phenotypes (e.g. BMI, alcohol intake) depending on their parent of origin. Many of these findings are contradictory and the identified regions are often too large to provide sensible interpretation. The only large-scale, genome-wide parent-of-origin study by Kong et al. 2009 focused primarily on type 2 diabetes and identified four loci. Here we present a novel method that is able to detect parent-of-origin effect using genome-wide genotype data of unrelated individuals. We demonstrate that parent-of-origin effects can be identified by increased phenotypic variance in the heterozygous genotype group relative to that in the homozygous groups. It is because in the heterozygous group, half of the population is mat-A/pat-B, and the other half is pat-A/mat-B, increasing the phenotypic variance in that group. Sixteen GWAS cohorts participated in our discovery analysis totalling ~48K individuals. We tested all SNPs in imprinted regions to see whether BMI is influenced in a parent-of-origin fashion. One SNP survived multiple testing (rs2471083 [T/C], variance (het vs. hom): 1.058 vs. 0.963, P = 1.07x10-6) 100kb upstream of the gene KCNK9. Mutation in this potassium channel gene causes Birk-Barel syndrome. SNPs within 2kb have been shown to be associated with HDL cholesterol, adiponectin and creatinine. We used four family-based studies to verify whether the increased heterozygous variance is indeed the consequence of parent-of-origin effect. The combined analysis (of 3,016 heterozygous individuals) confirmed that those individuals who carry the C allele paternally have 0.09 (SD unit) higher BMI on average than those carrying it maternally (P=0.0031). Currently gene expression experiments are underway to decide whether the variant may influence KCNK9 expression in a parent-of-origin fashion. Our method opens new avenues to exploit GWAS data of unrelated individuals in order to identify parent-of-origin effect.
Transcriptome Sequencing Uncovers Functional Variation in the Human Genome
Tuuli E. Lappalainen
Department of Genetics, Stanford University, Stanford, USA
Genome sequencing projects are discovering millions of genetic variants in humans, and interpreting their functional effects is essential for understanding the genetic basis of variation in human traits. One approach to address this challenge is analysis of genetic effects on cellular phenotypes, such as the transcriptome. In the Geuvadis project we sequenced mRNA and miRNA from lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project – the first uniformly processed RNA-seq data from multiple human populations with high-quality genome sequences. We discovered extremely widespread genetic variation affecting transcript structure and/or gene expression levels of the majority of genes. Our characterization of causal regulatory variation at the level of regulatory, loss-of-function, and disease-associating variants highlights the power of cellular phenotypes in mapping causal variants. Furthermore, in order to understand tissue-specificity of regulatory genetic variation, we have analyzed allele-specific expression (ASE) in 1582 samples of the GTEx pilot data from 171 individuals and 45 tissues with RNA-seq, genotype and partial exome-seq data. We show that while ~50% of cis-regulatory effects show are shared between tissues of the same individual, only a fraction of regulatory effects in one tissue can be captured by studying another tissue. Finally, interpreting multi-tissue functional effects of regulatory and loss-of-function variants of individual genomes sheds light on personalized transcriptomics, which will be essential to understand functional variation at the individual level. Altogether, integration of genome and transcriptome information in these studies not only uncovers huge catalogs of regulatory variants, but also provides insight into causal functional variants and architecture of regulatory variation in human tissues. This information improves our interpretation of genetic variants and biological mechanisms underlying phenotypic variation in human populations and individuals.
Human Genome Diversity and the Personal Drug Response Profile
Urs A. Meyer
Biozentrum der Universität Basel, Switzerland
Genomic medicine can be defined as the use of information from personal genomes and their derivatives (epigenomic modifications, RNAs, proteins, metabolites) to improve health or cure disease. Major technological breakthroughs allow the low cost, high throughput sequencing of literally millions of human genomes. In combination with advances in epigenomics, transcriptomics, proteomics and metabolomics this has caused an explosion of data on human genome diversity and has shed a fresh light on how interactions between the entire genome and nongenomic factors (e.g. environment-, life-style-, clinical factors) determine health and disease and drug response. In regard to pharmacogenomics, personal genome sequences reveal the totality of known genomic variants or genomic biomarkers associated with altered drug response, the “personal drug response profile”. Genomic or “omics” medicine is an important component of personalized medicine or of similar strategies to improve health outcomes (stratified medicine, precision medicine).
The opportunities of better prediction of individual disease risk and disease prevention, of more precise definition of disease sub-phenotypes, of more efficient drug development and of individualized treatment decisions is confronted with a number of impediments or translational challenges, including data management and interpretation of clinical relevance. The ultimate challenge is to translate this new knowledge into tangible benefits for the patient.
Epigenome Mapping at Allelic Resolution to Interrogate Genomic Basis of Human Disease
Tomi Pastinen
Departments of Human and Medical Genetics, McGill University and Genome Quebec Innovation Centre, Montreal, Canada
The combination of sequence and transcriptomic data have revealed principles of population specificity and inter individual variation in gene expression. Assignment of function of non-coding DNA can now be achieved by systematic epigenome mapping applying next-generation sequencing (NGS) based approaches. Disease associated sequence variation, in case of common complex traits, are enriched in functional non-coding elements, but mechanistic links are still rare.
Epigenome Mapping Centre at McGill (EMC-McGill) is pursuing understanding of tissue -and disease-specific epigenomic signatures in human primary cells and tissues. We adhere to the high resolution assay standards for NGS-based epigenomics established by the International Human Epigenome Consortium (IHEC) aiming to generate parallel chromatin, methylome and transcriptome maps in primary human cells and tissues. These data can now be utilized to interpret function of specific genetic variants associated to disease as well as investigate the biology of recently identified genomic features, such as enhancer associated RNAs (eRNAs). Preliminary results from human primary immune cells purified from population sources and from autoimmune disease patients indicate that eRNAs are often overlapping autoimmune disease associated SNPs and follow cell-type restricted expression correlating with tissue-specific methylation. Correlated allelic activity can be observed at enhancer and putative target RNAs. We work towards systematically mapping epigenomic variation and their effects on putative target genes to interpret disease SNPs and to enable targeting of future sequence-based analyses to gene-regulatory element units in disease genetics.
New Methods for the Analysis of Human Population Genomic Data
Adam Siepel
Biological Statistics and Computational Biology, Cornell University, Ithaca, NY, USA
This will be a two-part talk, summarizing my group's recent work on methods development and large-scale data analysis in the area of human population genomics. The first part of the talk will be concerned with the question of the genome-wide impact of mutations that influence gene regulation. I will describe a recent analysis of complete genome sequences and genome-wide chromatin immunoprecipitation and sequencing data that demonstrates that natural selection has exerted a profound influence on human transcription factor binding sites since our divergence from chimpanzees 4-6 million years ago. Our analysis is based on a new method, called INSIGHT, for characterizing natural selection from collections of short interspersed noncoding elements. We find that binding sites have experienced somewhat weaker selection than protein-coding genes, on average, but that the binding sites of several transcription factors show clear evidence of adaptation. We project that regulatory elements may make larger cumulative contributions than protein-coding genes to both adaptive substitutions and deleterious polymorphisms, which has important implications for human evolution and disease.
In the second part of the talk, I will discuss recent progress on the long-standing problem of inferring an "ancestral recombination graph" (ARG) from sequence data. The ARG provides a complete characterization of the correlation structure of a collection of sequences sampled from a population, and, in principle, fast, high-quality ARG inference could enable many improvements in population genomic analysis. However, the available methods for ARG inference are either extremely computationally intensive, depend on fairly crude approximations, or are limited to very small numbers of samples, and, as a consequence, they are rarely used in applied population genomics. I will present a new method for ARG inference, called ARGweaver, that is efficient enough to be applied on the scale of dozens of complete mammalian genomes. Experiments with simulated data indicate that ARGweaver converges rapidly to the true posterior distribution and is effective in recovering various features of the ARG, for twenty or more megabase-long sequences generated under realistic parameters for human populations. We have begun to apply our methods to high-coverage individual human genome sequences from Complete Genomics, and I will show that signatures of selective sweeps, background selection, recombination hot spots, and other features are all evident from properties of the inferred ARGs.
Genomics of Regulatory Variation
Kerrin Small
King's College, London, UK
Regulation of gene expression is a highly heritable, critical component of a variety of biological processes. Studies have shown that the majority of genetic variants associated with common diseases are regulatory, highlighting the importance of understanding the genetics of gene regulation in order to interpret the wealth of data generated in genome-wide association and whole genome sequencing studies. We use transcriptomic data generated from four tissues from a set of 850 deeply phenotyped twins from the TwinsUK cohort to both explore the regulatory architecture of gene expression and to highlight biological mechanism underlying GWAS signals. Further, despite sharing identical genomes, monozygotic twins do not usually develop the same disease. Using the twin structure in this dataset we explore twin discordance by investigating factors influencing differential expression between twins, including aging and the effects of X-inactivation on transcription from chromosome X.
Whole-Genome Sequence Based Association Studies of Complex Traits: the UK10K Project
Nicole Soranzo
Wellcome Trust Sanger Institute, Hinxton,UK
The UK10K project is a collaboration between multiple research centres mainly in the UK aiming to uncover rare genetic variants contributing to disease and health status by sequencing 10,000 people. As part of UK10K Consortium Cohorts Group, 3,621 individuals from two deeply phenotyped cohorts – TwinsUK and the Avon Longitudinal Study of Parents and Children (ALSPAC) – have been sequenced to average 6.5x coverage using next-generation sequencing technology. The data generated has been used to explore phenotypic associations in over 66 cardiometabolic and health related traits using both single point common variant and burden based rare variant tests. We identified more than 30 novel single point associations including low frequency associations underlying known loci, associations for novel loci of which ~40% with low frequency variants and associations in poorly imputed regions. These associations have been being followed up by: imputing variants discovered through WGS of the TwinsUK and ALSPAC cohorts along with those known from 1000 Genomes into the full genomewide association study, direct genotyping of these variants in the full cohort and replication in external cohorts is underway. Further analyses are allowing us to address a range of questions related to the performance and utility of whole-genome sequencing. Examples of these questions are: (i) To what extent rare variants contribute to phenotypic variance in complex traits; (ii) To what extent does the presence of population structure within phenotype and genotype data complicate the interpretation of association signals at differing allele frequencies; (iii) To what extent is there an abundance of functionally annotated genomic regions within loci harboring association signal at differing allele frequencies. Our results contribute to on-going debates about the genetic architecture of complex traits and point to avenues for the utility of whole genome sequencing studies.
Interpreting Patterns of DNA Methylation on a Genome-wide Scale
Michael Stadler
Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
Methylation of cytosines is an essential epigenetic modification in mammalian genomes and represents an important regulator of gene transcription. Decades of study have advanced our understanding of DNA methylation at CpG-rich gene promoters, but methylation patterns in the majority of the genome, which has only a low density of CpG dinucleotides, have remained poorly understood. We have used genome-wide datasets to identify and study patterns of DNA methylation, such as Low Methylated Regions (LMRs) that are formed by the binding of transcription factors and allow identification of enhancer regions, or Partially Methylated Domains (PMDs) that display a highly reproducible pattern of hypomethylation. We have developed a set of tools covering various aspects of genome-wide methylation data analysis, which is available as the QuasR and MethylSeekR R/Bioconductor packages at www.bioconductor.org.
Human Germ Line and Somatic Mutation Rates: Evolution, Biology and Statistical Genetics
Shamil R. Sunyaev
Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, USA
Sequencing technology enabled systematic identification of de novo germ line mutations and somatic mutations in cancer. Mutation rate appears to be variable along the human genome. Two evolutionary models may potentially explain the origin of mutation rate heterogeneity. The heterogeneity of mutation rate along the human genome has important consequences for evolutionary genomics and for statistical genetics approaches based on recurrent mutations. Analysis of de novo mutations also helps finding genes underlying Mendelian diseases. Replication timing, chromatin accessibility and negative selection maintaining hypermutable sequence contexts all contribute to the mutation rate heterogeneity. Mutation density in cancer genomes is highly variable at the 1Mb scale. Local mutation density can be predicted with substantial accuracy from epigenetic marks in tissue of origin. At a smaller scale, mutation density depends on chromatin accessibility. The data implicate Global Genome Repair (GGR) system as responsible for this dependence in melanoma. Germ-line mutation rate is less heterogeneous along the genome. However, context-specificity and epigenetic variables influence local germ-line mutation rate.
Rare Variants: Abundant and Deleterious, yet only Marginally Important for Disease
Daniel Wegmann
Department of Biology, Biochemistry, University of Fribourg, Switzerland
Despite many large scale genome wide association studies, the heritability of many complex diseases or traits remains largely unexplained. Several hypothesis have been put forward to explain this observation, including the idea that many complex diseases are highly polygenic with many mutations of very small effect contribution to the phenotype, or the hypothesis that many mutations may have similar effects, but are too rare to be picked up individually by current studies. We recently explored the rare variant diversity of 202 drug target genes in more than 14,000 individuals and found rare variants to be abundant (1 every 17 bases) and predominantly deleterious. But despite their abundance, we estimated that rare variants, in contrast to those tagged by common variants, are contributing only marginally to the disease risk for several complex diseases. In addition, we also found rare variants to be geographically very localized and their abundance unevenly distributed, which is greatly affecting rare variant association tests when not taking population stratification properly into account.