Dr Nathan Watson-Haigh

WORKSHOP: From monolithic bash script to Snakemake workflow

Recent years have seen a groundswell of support in the bioscience community for improved reproducibility of data analyses. Large analysis workflows are fragile ecosystems of software tools, scripts and dependencies. One solution to these issues is the use of a workflow management system such as Snakemake, capable of being executed across different computing environments from laptop/desktop to High Performance Computing.

Nathan will cover the core concepts of Snakemake with a focus on reimplementing a bash script for performing quality control, trimming and alignment of Illumina data against a reference sequence. In doing so, workshop attendees will be capable of starting to implement their own Snakemake workflows following the workshop.

Keywords: Snakemake, bash, HPC, pipeline, workflow

Requirements: 

  • Internet enabled laptop with an SSH client installed
  • Experience with the Linux command line
  • Experience with at least 1 scripting language

Relevance: Relevant to those who regularly write bioinformatics analyses as shell scripts and wish to move towards a less fragile, more reproducible and more efficient way of doing things.

Dr Nathan Watson-Haigh

Deputy Head of Bioinformatics, South Australian Genomics Centre (SAGC)

Nathan has over 20 years’ experience in the field of bioinformatics with expertise in genomics, transcriptomics, system biology, phylogenetics, bioinformatics training, Linux systems administration, pipeline development, high-performance and cloud computing. Nathan has worked predominantly with non-model organisms, particularly within the agrigenomics field. He has extensive experience with genome assembly, long-read sequencing and genome diversity analysis.