Overview
intermediate level –– genomic data analysis –– single-cell –– R
In the biological and clinical context, the identification of molecular signatures and corresponding feature extraction are two critical steps to understand diverse biological processes. In particular, a signature is defined as a group of molecular features (e.g. genes or genomic regions) that are sufficient to identify certain genotype or phenotype. For instance, expression signatures link a phenotype to a certain pattern of gene expression1,2 whereas enhancer signatures define subtypes based on the regulatory landscape3.
Non-negative Matrix Factorization (NMF) has been widely used for the analysis of genomic data to perform feature extraction and signature identification4,5. However, running a basic NMF analysis requires the installation of multiple tools and dependencies, along with a steep learning curve and computing time. To mitigate such obstacles, we developed ButchR and ShinyButchR6, a novel NMF toolbox that provides a complete NMF-based analysis workflow, allowing the user to perform matrix decomposition using NMF, feature extraction, interactive visualization, relevant signature identification and association to biological and clinical variables.
Learning objectives
The aim of this tutorial is to learn how to use ButchR to perform signature identification in different types of genomic data. To explore the results of an NMF analysis, we will provide a ready to use Docker image with RStudio, ButchR, and pre-loaded publicly available datasets, including bulk and single-cell RNA-seq data, as well as an interactive application. The tutorial will show how to run an NMF-based analysis from start to end.
Schedule
Time |
Activity |
Session 1 - Introduction
|
09:00 – 09:30 |
Ice breaker: Course expectations |
09:30 – 10:15 |
Introduction to Non-Negative Matrix Factorization (NMF) and its usage in genomics |
10:15 – 10:45 |
Coffee break and discussion |
Session 2 - Matrix decompensation
|
10:45 – 11:15 |
How to use ButchR with Docker |
11:15 – 11:45 |
Pre-processing data to use with NMF |
11:45 – 12:15 |
Matrix decomposition with ButchR |
12:15 – 13:30 |
Lunch break |
Session 3 - Results interpretation
|
13:30 – 14:00 |
Selection of optimal factorization rank |
14:00 – 14:30 |
Signature identification |
14:30 – 15:00 |
Feature extraction and enrichment analysis |
15:00 – 15:30 |
Interactive analysis with ShinyButchR |
Session 4 - Discussion |
15:30 – 16:00 |
Discussion and concluding remarks |
17:00 |
[BC]2 Welcome lecture |
Audience and requirements
Maximum number of participants: 12
This tutorial is for computational biologists dealing with large scale omics datasets (e.g. RNA-seq, ATAC-seq, …) looking for solutions to reduce the dimensionality of the data to a small set of informative signatures.
The attendees are expected to bring their own laptop with Docker pre-installed. To avoid any delay in setting up the container during the practice sessions, the Docker image for the workshop should be downloaded beforehand. This can be done by opening a command-line terminal (e.g., Powershell and Terminal) and running the command “docker pull hdsu/butchr”. A complete overview of how to install Docker can be found here: https://docs.docker.com/desktop/. In addition, a detailed explanation of how to use the ButchR docker image can be found here: https://hub.docker.com/r/hdsu/butchr. Basic R coding skills will be helpful, although the tutorial will cover all the steps, from loading data to exporting results.
Upon arrival, the attendees will receive an R Markdown file with a step-by-step guide of how to use ButchR and ShinyButchR including an example dataset and how to interpret the NMF results.
Organisers
- Carl Herrmann (Group Leader, University Clinics Heidelberg, Germany)
- Andres Quintero (PhD candidate, University Clinics Heidelberg, Germany)
References
1. Szymczak, F., Colli, M. L., Mamula, M. J., Evans-Molina, C. & Eizirik, D. L. Gene expression signatures of target tissues in type 1 diabetes, lupus erythematosus, multiple sclerosis, and rheumatoid arthritis. Sci. Adv. 7, (2021).
2. Sotiriou, C. & Pusztai, L. Gene-Expression Signatures in Breast Cancer. N. Engl. J. Med. 360, (2009).
3. Gartlgruber, M. et al. Super enhancers define regulatory subtypes and cell identity in neuroblastoma. Nat. Cancer 2, (2021).
4. Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature (2020). doi:10.1038/s41586-020-1943-3
5. Pal, S. et al. Isoform-level gene signature improves prognostic stratification and accurately classifies glioblastoma subtypes. Nucleic Acids Res. 42, e64 (2014).
6. Quintero, A. et al. ShinyButchR: Interactive NMF-based decomposition workflow of genome-scale datasets. Biol. Methods Protoc. 5, (2020).
7. Liu, L. et al. Deconvolution of single-cell multi-omics layers reveals regulatory heterogeneity. Nat. Commun. 10, (2019).