“Presenting the poster allowed me to showcase my work and helped immensely with networking. Having this opportunity to share my research was incredibly powerful in terms of furthering my skills for a career in sciences.”

Tahlia Perry, The University of Adelaide

ePosters

Click on each image below to view the AMSI BioInfoSummer 2020 abstracts and full screen ePosters. Learn more at the Fast Forward ePoster talks where each presenter will have only 90 seconds to give an overview of their poster and research.

Impact of gene annotation choice on the quantification of RNA-seq data

Dr David Chisanga

Getting a foothold in Pan-Genomics

Miss Chelsea Matthews

Development of Bioinformatics tools for 16S gut microbiota community analysis and data visualisation in Acute Lymphoblastic Leukaemia

Ms Holly Martin

Molecular diagnostics of antimicrobial resistance in Neisseria gonorrhoeae: a state of play.

Miss Olivia Jessop

Integrative resource for network-based investigation of COVID-19 combinatorial drug repositioning and mechanism of action

Dr Shadma Fatima

PiMaker: Vectorized diversity statistics to measure evolutionary pressure in genetic populations at scale

Mr Joseph Lalli

Bioinformatics re-analysis of genome sequencing data increases the genetic testing diagnostic yield of inherited heart disease

Ms Yuchen Chang

Accurate detection of focal gene deletions in B-cell acute lymphoblastic leukaemia using RNA-seq data

Ms Jacqueline Rehn

Pan-cancer Analysis reveal functional similarity of three lncRNAs across multiple tumors

Mrs Abir Khazaal

RNA-seq regulatory network inference revealed an association between transcription factor SETX and neurogenerative pathways under prolonged autophagy induction

Miss Wenjun Liu

Bayesian Copula Directional Dependence for causal inference on gene expression data

Miss Vasiliki Vamvaka

Population genomics of Cryptosporidium hominis across five continents identifies two subspecies that have diverged and recombined during 500 years of evolution

Mr Swapnil Tichkule

The Shape of Empirical Phylogenies Under Phase-Type Distributed Times to Speciation Model

Mr Albert Soewongsono

Bioinformatics re-analysis of genome sequencing data increases the genetic testing diagnostic yield of inherited heart disease

“Introduction: The Australian Genomics Cardiovascular Genetic Disorders Flagship aims to translate genomic research into clinical practice for people with inherited heart disease. After accredited clinical genome sequencing, a number of participant’s genetic cause of disease remains unsolved. We have performed research-based secondary bioinformatics analysis of genome sequencing data to identify additional causes of disease.

Methodology: We looked for single nucleotide variants, copy number variants and deep intronic splice-gain variants in 573 genes implicated in cardiac diseases. We also investigated variants in the mitochondrial genome. Identified variants were classified for pathogenicity at the Victorian Clinical Genetics Service.

Results: To date, we have analysed 235 participants with inherited heart disease. We have solved 5/123 participants who did not have a genetic cause of disease identified following clinical genetic testing. These include pathogenic nonsense variants in PRDM16, TTN and TBX20, a deep intronic splice-gain variant in MYBPC3, and a mitochondrial genome variant. Ten additional participants have candidate pathogenic variants awaiting clinical classification or functional assessment. A further two patients had incidental genetic findings related to clinically relevant cardiac phenotypes.

Conclusions: Secondary bioinformatics analysis of genome sequencing data identifies additional causes of inherited heart disease that were missed using current clinical genetic testing approaches.”

Author: Yuchen Chang, Centenary institute

Molecular diagnostics of antimicrobial resistance in Neisseria gonorrhoeae: a state of play.

Neisseria gonorrhoeae is the second most common sexually transmitted bacteria globally, known to cause morbidity and severe disease. Exacerbating its virility is its propensity to develop resistance to antibiotics. To successfully treat an infection, the resistance profile of an infecting organism needs to be determined. Molecular diagnostic tools that can diagnose antibiotic resistance from non-cultured samples will increase the amount of data available for public health surveillance, particularly in an era of decline of bacterial culturing. However, there is space for improvement to increase the predictive capabilities of current genomic tools. In this work, a literature review was conducted to identify the gaps in knowledge of molecular diagnostics. Papers were chosen based on their relevance to the evolution of molecular diagnostic tests, and innovative methods of predicting a bacterial phenotype (i.e. its resistance profile) based on its genotype. The main issues left to address are continual identification and incorporation of new resistance mutations into a diagnostic tool, and distinguishing strains of interest from commensal strains. There have been innovative methods to predict phenotype from genotype, such as using phylogenetic associations. In summary, while molecular diagnostics hold promise, this review highlighted the need for research into some key areas.

Author: Miss Olivia Jessop

Getting a foothold in Pan-Genomics

Pan-genomic models are composed of genomic data from multiple members of a single population. Pan-genomes can model either sequence variation or genic variation within a population and there are a number of different types. These models have applications in improving read alignment accuracy, variant calling, haplotype inference, and in linking phenotype with genotype. Unfortunately, the learning curve for beginners in pan-genomics is steep. When you first start reading the literature surrounding pan-genomics, it’s hard to understand what’s going on because the word “pan-genome” is defined differently by different authors and there is no linguistic differentiation between different types of pan-genomic models. This is confusing and makes it very difficult to identify literature that is relevant to your own work. Here we summarise the different types of pan-genomic models and furnish them with descriptive names. Hopefully these names can be used in discussions around pan-genomics in the future and will help to clear up any potential confusion for people new to the field.

Author: Miss Chelsea Matthews

Accurate detection of focal gene deletions in B-cell acute lymphoblastic leukaemia using RNA-seq data

Focal deletions affecting genes involved in B-cell differentiation, cell cycle regulation and cell survival are common in B-cell acute lymphoblastic leukaemia (B-ALL) and are associated with clinical prognosis. Although detection of whole gene deletions is only possible with DNA based assays, focal deletions that remove only some exons of a gene result in the expression of aberrantly spliced transcripts. Detection of these splice variants within RNA-seq data can assist in patient prognostication in cases where DNA is not available for copy number variant analysis. Intragenic deletions of IKZF1, ERG, PAX5, ETV6 and RB1 were detected from 268 RNA-seq samples using bedtools and compared with deletions identified by multiplex ligation-dependent probe amplification (MLPA), the current gold standard for focal gene deletion analysis. Partial gene deletions were identified with a sensitivity of 0.80 (133 or 167 deletions detected) and a specificity of 0.99 (1 false positive in 1134 observations). Undetected deletions were associated with deletion of the first transcribed exon or a gene fusion event. The developed method can assist with the concurrent detection of prognostically significant genomic alterations in transcriptionally sequenced B-ALL. Work is ongoing to expand this method to encompass additional gene deletions important in other leukaemic cell types.

Author: Ms Jacqueline Rehn, The University of Adelaide

Development of Bioinformatics tools for 16S gut microbiota community analysis and data visualisation in Acute Lymphoblastic Leukaemia

The gut microbiota is an emerging target for clinical intervention to improve patient outcomes, treatment responses and overall health. Data visualisation is an important consideration in communicating results in an accurate and appealing way to researchers and clinicians. In this project we build a bioinformatics workflow and visualisation tool to aid analysis of gut microbiota 16S rRNA gene sequencing data from Acute Lymphoblastic Leukaemia (ALL) patients. We developed ‘shinyMicrobiota’, an R/Shiny application for interactive visualisation and exploration of this data. shinyMicrobiota uses outputs from the workflow to produce plots displaying taxonomic composition, alpha diversity, beta diversity and bacteroidetes to firmicutes ratios. These plots can be customised based on variables in the uploaded metadata and include a range of features. A standardised bioinformatics and visualisation workflow was used to analyse three independent datasets; a preclinical mouse model of ALL, a clinical dataset from ALL patients and a pre-diabetes public dataset from the Human Microbiome Project. These analyses demonstrate the functionality of the workflow and Shiny app across various experimental designs (animal, clinical and public datasets) and scales (34-398 samples). This demonstrates the ability of the workflow and shinyMicrobiota to scale to larger datasets and across both preclinical and clinical data.

Author: Ms Holly Martin, The University of Adelaide

Impact of gene annotation choice on the quantification of RNA-seq data

RNA sequencing is currently the method of choice for genome-wide profiling of gene expression. A popular approach to quantify gene expression levels from RNA-seq data is to map reads to a reference genome and then count mapped reads to each gene. Gene annotation data, including chromosomal coordinates of exons for tens of thousands of genes, is required for this quantification process. For human and mouse genomes, several major sources of gene annotations can be used for quantification, such as Ensembl, GENCODE, UCSC, and RefSeq databases. However, there is very little understanding of the effect that the choice of annotation has on the quantification of gene expression in an RNA-seq pipeline. In this talk, I will present results from our comparison of Ensembl and RefSeq human annotations on their impact on gene expression quantification using benchmark RNA-seq data generated by the SEquencing Quality Control (SEQC/MAQC III) consortium. We found that the use of RefSeq gene annotation led to better quantification accuracy, based on the correlation with ground truth such as expression data from >800 real-time PCR validated genes.

Author: Dr David Chisanga, La Trobe University

The Shape of Empirical Phylogenies Under Phase-Type Distributed Times to Speciation Model

Phylogenetic trees are widely used to understand the evolutionary history of organisms. Their branch lengths provide information about past diversification dynamics. However, existing macroevolutionary models are unreliable for inferring the true processes underlying empirical trees. Here, we propose a flexible and biologically plausible macroevolutionary model for phylogenetic trees where times to speciation are drawn from a Coxian phase-type (PH) distribution. In the process, we derive a likelihood expression for the probability of observing any tree with branch lengths under a model with speciation but no extinction. Finally, we illustrate the application of our model by performing both absolute and relative goodness-of-fit tests for two large empirical phylogenies (squamates and angiosperms) that compare models with Coxian PH distributed times to speciation with models that assume exponential or Weibull distributed waiting times. In our numerical analysis, we found that, in most cases, models assuming a Coxian PH distribution provided the best fit. In addition, this model allows us to fit hazard rate for speciation and we found evidence that speciation rates had changed through time in some clades of the squamate phylogeny.

Author: Mr Albert Soewongsono, University of Tasmania

Pan-cancer Analysis reveal functional similarity of three lncRNAs across multiple tumors

Long non-coding RNAs (lncRNAs) are emerging as key regulators, playing a role in chromatin modification, transcriptional and post-transcriptional regulation in many biological processes. Dysregulation of lncRNA expression has been associated with many diseases, including cancer. Cancer is a complex disease, known to be caused by genomic alterations, affecting mostly non-coding regions. Mounting evidence suggests lncRNA to be involved in cancer initiation, progression and metastasis. Thus, understanding the functional implications of lncRNAs in tumorigenesis can aid in developing novel biomarkers and therapeutic targets. Rich cancer datasets, documenting genomic and epigenomic alterations together with advancement in bioinformatics tools, have presented an opportunity to perform pan-cancer analyses across different cancer types. We aimed at conducting a pan-cancer analysis of lncRNAs, by performing differential expression and functional analyses between tumor and normal adjacent samples, across eight cancer types. In total, 9,616 deregulated lncRNAs were identified, among which, seven were shared across all cancer types. We focused on three lncRNAs, found to be consistently dysregulated among tumors. It has been observed that the three lncRNAs of interest, are interacting with a wide range of genes across different tissues, yet enriching substantially similar biological processes, found to be implicated with cancer progression and proliferation.

Author: Mrs Abir Khazaal, The University of New South Wales

RNA-seq regulatory network inference revealed an association between transcription factor SETX and neurogenerative pathways under prolonged autophagy induction

Despite the global prevalence of dementia, disease pathogenesis remains unclear, and treatments continue to disappoint. Recent studies found possible associations between neurodegeneration and autophagy dysfunction. To explore this hypothesis, we induced autophagy in three human cell lines using two treatments targeting different stages of the autophagic pathway, collecting RNA-Seq data at three time-points up to 30hrs. Analysis revealed changes in transcriptional activities and biological processes in response to autophagy activation. Although successful activation of autophagy was verified through measuring autophagic flux, the autophagy pathway (KEGG) appeared to be consistently inhibited, rather than activated. This finding suggests that when cells were under prolonged autophagy induction, classical autophagy-related genes may be regulated to maintain autophagic homeostasis. Conversely, the Alzheimer’s Disease pathway was consistently activated in response to the extended activation of autophagy. During this work, an existing topology-based pathway enrichment testing method SPIA was improved by incorporating a more robust significance testing strategy. In addition, a novel regulatory network inference method was developed in order to identify the regulatory influences of specific transcription factors on relevant biological pathways. This approach revealed a potential role for the transcription factor SETX on a list of neurodegenerative diseases pathways including AD under prolonged autophagy activation.

Author: Miss Wenjun Liu, The University of Adelaide

PiMaker: Vectorized diversity statistics to measure evolutionary pressure in genetic populations at scale

“From heterogenous tumors to microbiomes to viromes, researchers are increasingly appreciating the role of diversity and evolution in human health. Classical evolutionary statistics such as FST, Ï€N/Ï€S, and Tajima’s D are used to measure exactly how diversity changes between populations, and how individual genes are responding to evolutionary pressure. Supported by a rich body of literature, these statistics summarize terabases of data in statistically interpretable ways. Unfortunately, the tools used to measure these statistics are often slow and domain-specific.
Taking advantage of memoization, multithreading, and vectorization, PiMaker is a tool designed to measure evolutionary statistics at scale. Given a reference sequence(s), a VCF, and a GTF of genetic regions, it is able to calculate common diversity statistics both globally and for individual regions of interest. It is capable of working with overlapping regions and multiple contigs, and is able to calculate rolling windows. PiMaker calculated within-host influenza diversity statistics for 328 influenza genomes ~900 times faster than SNPGenie and Popoolation2, with identical results. PiMaker works with many genome types, successfully replicating population diversity analysis in SARS-CoV-2, TB, and Drosophila population studies. This tool allows researchers to describe the strength and direction of natural selection in a wide variety of contexts.”

Author: Mr Joseph Lalli, University of Wisconsin-Madison

Integrative resource for network-based investigation of COVID-19 combinatorial drug repositioning and mechanism of action

An effective monotherapy to target the complex and multifactorial pathology of SARS-CoV-2 infection poses a challenge to drug repositioning, which can be improved by combination therapy. We developed an online network pharmacology-based drug repositioning platform, COVID-CDR (http://vafaeelab.com/COVID19repositioning.html), that enables a visual and quantitative investigation of the interplay between the primary drug targets and the SARS-CoV-2-host interactome in the human protein-protein interaction network. COVID-CDR prioritizes drug combinations with potential to act synergistically through different, yet potentially complementary, pathways. It provides the options for understanding multi-evidence drug-pair similarity scores along with several other relevant information on individual drugs or drug pairs. Overall, COVID-CDR is a first-of-its-kind online platform that provides a systematic approach for pre-clinical in silico investigation of combination therapies for treating COVID-19 at the fingertips of the clinicians and researchers.

Author: Dr Shadma Fatima, The University of New South Wales

Bayesian Copula Directional Dependence for causal inference on gene expression data

Modelling and understanding gene networks is a major challenge in biology as they play an important role in the architecture and function of genetic systems. Several methods have been proposed for the reconstruction of gene regulatory systems from their gene expression, such as Boolean networks, Bayesian networks, Linear models, and Copula Directional Dependence. Among these methods, Copula directional Dependence (CDD) can measure the directed connectivity among genes without any strict requirements of distributional and linearity assumptions. Furthermore, copulas can achieve that by isolating the dependence structure of a joint distribution. In this work, a novel proposed Bayesian CDD method is introduced as an extension to the frequentist method. The method was tested in both scRNA-seq and bulk sequencing data. The results illustrate that the suggested Bayesian CDD method identified 60% of true interactions on bulk sequencing data. This proves the potential of the method to model gene regulatory systems and introduce robustness to different biological settings.

Author: Miss Vasiliki Vamvaka, The University of New South Wales

Systematic evaluation for metrics of gene expression variability in single-cell RNA sequencing data

During ageing, transcriptional noise has been shown to increase in multiple organs and tissues. Transcriptional noise is defined as the variability of gene expression, and this property reflects the heterogeneity that results from stochastic cell to cell variation. Although the concept of transcriptional noise is not new, different metrics are being used to measure this and it is unclear what the optimal approach is. With the advent of single-cell sequencing techniques, it is becoming possible to quantify how noise is distributed through the genome. The project focuses on understanding how to accurately model transcriptional noise as a regulatory property of the genome and its contribution to the fundamental feature of ageing.

To conduct a systematic evaluation, we selected 12 different metrics that commonly used in scRNA-seq studies. Performance of these metrics is tested with simulated and experimentally-derived datasets. We investigated the performance of these metrics against different data structures, stably expressed genes and other properties. Using a publicly available scRNA-seq dataset with multiple tissues and age groups for mice, we intend to investigate how transcriptional noise changes during ageing. Through the analysis, the goal is to understand how transcriptional noise impacts the regulatory processes that underlie ageing.

Author: Miss Huiwen Zheng, The University of Queensland

Population genomics of Cryptosporidium hominis across five continents identifies two subspecies that have diverged and recombined during 500 years of evolution

Cryptosporidium is a significant public health problem and one of the primary causes of diarrhoea in humans, particularly in very young children living in low- and middle-income countries. Here, we present a comprehensive whole-genome study of C. hominis, comprising 114 isolates from 16 countries within five continents. We detect two highly diverged lineages with distinct biology and demography that have diverged circa 500 years ago. We consider these lineages as two subspecies, and provisionally propose the names C. hominis hominis (clade 1) and C. hominis aquapotentis (clade 2 or gp60 subtype IbA10G2). C. h. hominis is mostly found in low-income countries in Africa and Asia, and it appears to have recently undergone population contraction. In marked contrast, C. h. aquapotentis was found in high-income countries, mainly in Europe, North America, and Oceania, and we reveal a signature of population expansion. Moreover, we detected genomic regions of introgression representing gene flow after secondary contact between the subspecies from low- and high-income countries. We demonstrate that this gene flow resulted in a genomic island of high diversity and divergence and that this diversity at potential virulence genes is maintained by balancing selection, suggesting that they are involved in a coevolutionary arms race

Author: Mr Swapnil Tichkule, The University of Melbourne