Hi-C explores genome-wide chromatin architecture and identifies long-range enhancers
DNA is traditionally analysed by bioinformaticians in a linear way, but recently it has been recognised that higher-order organisation of the chromatin plays a critical biological function by regulating gene expression and cell fate.
Hi-C is a method to detect DNA-DNA interactions in the mammalian nucleus at the whole genome-wide level. Chromatin is cross-linked, digested and re-ligated in such as way that only DNA fragments that are linked together form ligation products. The chimeric DNA ligation junctions are selectively purified and then subjected to Illumina short read sequencing. Mapping the sequence reads to distinct genomic loci then reveals 3D chromatin organisation and potential enhancer-promoter interactions.
In this talk, I will introduce the Hi-C technology and will outline a bioinformatics analysis approach using R and Bioconductor software. I will illustrate Hi-C in the context of a recent study in which we showed that the transcription factor Pax5 controls changes in the global chromatin structure of B cells during activation. This illustrates how a lineage-defining transcription factor can maintain cell identity via global control of genome organization in order to activate lineage-specific genes and repress inappropriate genes.
Professor Gordon Smyth
Bioinformatics Division, Walter & Eliza Hall Institute of Medical Research
Gordon Smyth completed a BSc (Hons) in mathematics at the University of Western Australia in 1977 and a PhD in statistics at the Australian National University in 1985. He spent 16 years as a member of mathematics and statistics departments at the University of California, The University of Queensland, the United States Naval Postgraduate School and the University of Southern Denmark. In 2001 he moved to Melbourne to join the growing Bioinformatics research group at the Walter and Eliza Hall Institute, as has been there ever since, becoming a Lab Head in 2007 and Division Head in 2014.
Gordon loves to adapt statistical models and algorithms to analyse high dimensional biomedical data in collaboration with biomedical scientists. His current research focuses on cancer and immunological diseases. He finds genomic research a rich environment that stimulates statistical research. He is well known for promoting empirical Bayes methods and linear models to analyse data from gene expression experiments.
His research group maintains some popular R software packages for genomic data analysis, especially limma and edgeR for RNA-seq and microarray data, Rsubread for aligning short sequence reads, csaw for ChIP-seq and diffHic for analysis of Hi-C data.