Talk Title: Statistical Analysis of RNA-seq Data: From Reads to Genes to Pathways
RNA-seq is the current technology of choice for gene expression experiments. One of the most common aims of RNA-seq profiling is to identify genes or molecular pathways that are differentially expressed between two or more biological conditions. Another more detailed aim may be to determine whether particular gene isoforms are more abundant in one condition than another.
Like all sequencing technologies, RNA-seq produces large data files that may be complex to analyse. Large data sets can easily consume months of computing time on a high performance computer. Meanwhile they can also be statistically challenging because of the need to make inferences about so many genomic features based often on a very small number of independent replicates.
This talk will discuss RNA-seq experiments from a statistical point of view, and will describe some statistical and computational strategies that allow RNA-seq experiments to be analysed quickly and reliably. As time permits, the talk will cover read alignment, read counting, differential expression, pathway analysis and differential splicing analysis.
Professor Gordon Smyth, Bioinformatics Division, Walter & Eliza Hall Institute of Medical Research
Gordon Smyth completed a BSc (Hons) in mathematics at the University of Western Australia in 1977 and a PhD in statistics at the Australian National University in 1985. He spent 16 years as a member of mathematics and statistics departments at the University of California, The University of Queensland, the United States Naval Postgraduate School and the University of Southern Denmark. In 2001 he moved to Melbourne to join the growing Bioinformatics research group at the Walter and Eliza Hall Institute, as has been there ever since, becoming a Lab Head in 2007 and Division Head in 2014.
Gordon loves to adapt statistical models and algorithms to analyse high dimensional biomedical data in collaboration with biomedical scientists. His current research focuses on cancer and immunological diseases. He finds genomic research a rich environment that stimulates statistical research. He is well known for promoting empirical Bayes methods and linear models to analyse data from gene expression experiments.
His research group maintains some popular R software packages for genomic data analysis, especially limma and edgeR for RNA-seq and microarray data, Rsubread for aligning short sequence reads, csaw for ChIP-seq and diffHic for analysis of Hi-C data.