This workshop will familiarise you with the basics of R through the RStudio interface and the tidyverse suite of R packages. You will be introduced to modern approaches to data analysis and visualisation. The focus is on mastering basic skills and showing you where to go for help so you can undertake future analyses independently. By the end of this workshop you will know how to create and organise new “projects” in RStudio; read in data files; visualise data using the popular ggplot2 package; perform various data manipulation, summarisation and modelling tasks; and create reproducible reports for bioinformatics analysis pipelines.
In Part A of this workshop, we will first familiarise ourselves of the basics of R, e.g. loading in an Excel dataset, recognising variable types. We will be using the R Markdown documentation system, which allows us to execute codes, visualise output and writing a report. Time permitting, we will also start to learn the basics of data manipulations such as filtering of observations and selection of columns.
Some of the packages to be covered: rmarkdown, readr, readxl, voom, janitor and dplyr.
In Part B of this workshop, we will focus on data cleaning and data visualisation. This type of tasks is where the tidyverse framework becomes one of the most powerful tools in data science. We will learn how to summarise data, converting between “wide” and “tall” data frames and also how to integrate different datasets. Using the techniques we learnt, we will massage the data into a suitable format and perform some statistical modelling. We will also introduce some powerful wrapper functions that can help us to write better and cleaner codes.
Some of the packages to be covered: tibble, broom, purrr, dplyr, tidyr and ggplot2
Key words: statistical computing; R; tidyverse; data manipulation; data visualisation
Requirements: You will need to bring your own laptop. Please make sure it has the latest version of R installed and the latest version of RStudio Desktop. Participants do not need to have existing knowledge of either R or RStudio.
Relevance: This workshop is relevant to anyone who is interested in learning more about R and how it can help streamline your data processing and analysis workflow. For example, if you currently spend a lot of time doing repetitive manual data manipulation tasks in Excel, you will benefit greatly from learning more about a statistical computing language such as R and the process of generating code for reproducible analyses. This workshop is also for people who might have learnt R a few years ago and is interested in upskilling in the recent advances, such as the RStudio interface and the tidyverse suite of packages (ggplot2, dplyr, readr, etc).
Statistics Tutor and PhD Candidate at The School of Mathematics and Statistics, The University of Sydney
Lecturer in Statistics and Data Science, The University of Sydney