Pei Wang

iProFun: An integrative analysis tool to screen for Proteogenomic Functional traits

In this talk, I will introduce iProFun, an integrative analysis tool to screen for Proteogenomic Functional traits perturbed by DNA copy number alterations (CNAs) and methylations. The goal is to characterize functional consequences of DNA copy number and methylation alterations in tumors and to facilitate screening for cancer drivers contributing to tumor initiation and progression. Specifically, we consider three functional molecular quantitative traits: mRNA expression levels, global protein abundances, and phosphoprotein abundances. We aim to identify those genes whose CNAs and/or DNA methylations have cis-associations with either some or all three types of molecular traits. In comparison with analyzing each molecular trait separately, the joint modeling of multi-omics data enjoys several benefits: iProFun experienced enhanced power for detecting significant cis-associations shared across different omics data types; and it also achieved better accuracy in inferring cis-associations unique to certain type(s) of molecular trait(s). For example, unique associations of CNAs/methylations to global/phospho protein abundances may imply post-translational regulations. I will show an application of iProFun on ovarian high-grade serous carcinoma tumor data from TCGA and CPTAC. The result suggests potential drug targets for ovarian cancer.

Workshop: Methods/tools for preprocessing and imputation of mass spectrometry-based proteomics data

Due to the dynamic nature of the mass spectrometry (MS) instruments, analyzing MS based proteomics data requires customized tools for routine preprocessing such as normalization, outlier detection/filtering, and batch correction. Moreover, proteomics data often contains substantial missing values. These together impose great challenges to data analyses. Specifically, many tools and methods, especially those for high dimensional data, often cannot deal with missing values directly. Furthermore, missing in proteomics data are not missing-at-random. Thus simply ignoring missing values or imputing them with constants will lead to biased results. In this talk, I will share a suite of preprocessing and imputation methods/tools for handling proteomics data. A specific focus will be given to an imputation method, DreamAI, which was resulted from an NCI-CPTAC Proteomics Dream Challenge that was carried out to develop effective imputation algorithms for proteomics data through crowd learning. DreamAI, is based on ensemble of six different imputation methods. The favorable performance of DreamAI over existing tools was demonstrated on both simulated and real data sets. Follow-up analysis based on the imputed data by DreamAI revealed new biological insights, suggesting this new tool could enhance the current data analysis capabilities in proteomics research.

Professor Pei Wang

Professor Pei Wang

Professor of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York

Pei Wang is a Professor in Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai. Dr. Wang obtained her B.S. in Mathematics from Peking University, China, in 2000. She then pursued her graduate study in the U.S. and received a Ph.D. in Statistics from Stanford University in 2004. Between 2004-2013, Dr. Wang served as a faculty at Fred Hutchinson Cancer Research Center and University of Washington, Seattle, WA. In Oct 2013, she joined Icahn Medical School at Mount Sinai, New York. Dr. Wang’s research has been focused on developing statistical and computational methods to address scientific questions based on data from high throughput biology/genetics experiments as well as modern digital health studies. Dr. Wang and her team have developed numerous novel statistical methods for analyzing and integrating various genetic/genomic/proteomic data. In the past decade, Dr. Wang has been actively involved in the NCI funded CPTAC (Clinical Proteomic Tumor Analysis Consortium). Currently, Dr. Wang is the MPI of the national Proteomics and Genomics data analysis center of CPTAC.