scMerge: Integration of multiple single-cell transcriptomics datasets leveraging stable expression and pseudo-replication

Concerted examination of multiple collections of single cell RNA-Seq (scRNA-Seq) data promises further biological insights that cannot be uncovered with individual datasets. However, such integrative analyses are challenging and require sophisticated methodologies. To enable effective interrogation of multiple scRNA-Seq datasets, we have developed a novel algorithm, named scMerge, that removes unwanted variation by combining stably expressed genes and utilizing pseudo-replicates across datasets. Analysis of large collections of publicly available datasets demonstrates that scMerge performs well in multiple scenarios and enhances biological discovery, including inferring cell developmental trajectories.

Workshop: scMerge

The workshop will cover the functionality within the scMerge R package to integrate multiple datasets across batches and experiments. Key emphasis will be given on diagnostic plots and evaluation of batch effects for high dimensional and complex data. This will be achieved using a few case study datasets with distinct characteristics such as time course experimental design, and strong differences in technological platforms (e.g. significant read-depth differences).

Each participant is required to bring their laptop with the scMerge and R packages installed.

Follow the instructions in the Installation section on the package website:

The workshop tutorial is available at and specifically for participants to follow

Shila Ghazanfar

Dr Shila Ghazanfar

Postdoctoral Research Associate
The Judith and David Coffey Life Lab, Charles Perkins Centre, University of Sydney
School of Mathematics and Statistics, University of Sydney

Shila Ghazanfar completed her undergraduate studies with majors in Mathematics and Statistics and Honours in Statistics as well as her PhD in Statistical Bioinformatics at The University of Sydney. Her PhD studies focused on the integrative analysis of transcriptomics data with other types of omics data and external information such as protein interaction networks.

Shila is currently a Research Associate in the Judith and David Coffey Lifelab and School of Mathematics and Statistics at The University of Sydney. She is interested in statistical analysis of high-throughput sequencing data and how this can be used to answer important questions in biological and medical research. Her research experience includes developing statistical and analytical approaches for sequence data such as RNA-Seq and network analysis, as well as building interactive R/Shiny applications.

Contact Us

We're not around right now. But you can send us an email and we'll get back to you, asap.

Not readable? Change text.