RESEARCH OVERVIEW

Individuals in a population have varying degrees of risk of disease, which is driven by a unique combination of rare and common variants that interact with each other and the environment in complex biological networks. Our passion is to understand the underlying genetic mechanisms that would help us better identify high-risk patients from Electronic Health Records (EHR) in advance of being afflicted with disease. The focus of our lab is to develop novel machine learning methods and workflows to integrate high dimensional information (e.g. imaging data, EHR and multi-“omic” data) across ancestral groups, sexes and environments to better understand the genetic etiology of complex human diseases across the “phenome”, and their connections with complex human diseases.

Join Our Team

Contact yzv101@psu.edu to join our team and help us advance the frontiers of human knowledge!

INTEGRATING MULTI-MODAL DATA WITH ELECTRONIC HEALTH RECORDS

We seek to integrate multiple layers of high-dimensional multi-modal data including genomics, transcriptomics, copy number variation, neuroimaging, lifestyle-related factors, and social determinants of health with electronic health records to understand the interplay between different layers of data and their role on diseases. While gene expression data is typically measured on a smaller sample of individuals, rapid advances in genotyping technologies have enabled genetic and phenotypic data to be available on a much larger sample size, typically from large scale medical biobanks or other prospective cohorts. Our lab has expertise in application of statistical and computational methods that can integrate multi-omic data from non-overlapping samples to (a) better characterize genetic variants (common, rare and structural) underlying these complex phenotypes as well as (b) understand shared genetic interrelationships among these phenotypes. We are also developing statistical methods that can integrate neuroimaging data with genomics to predict risk of cognitive decline and associated comorbidities.

UNDERSTANDING DIFFERENCES IN DISEASE RISK BETWEEN POPULATION SUBGROUPS

We are interested in studying factors that contribute to an increase in risk of complex human diseases. One of the conditions of current interest to us is late-age cognitive decline that is a precursor to dementia. There are several multimorbidities (e.g. heart disease, type II diabetes) that increase the risk of cognitive decline, whose incidence rates tend to differ by population subgroups e.g., ethnic backgrounds, sex, rural-urban status. We have developed Bayesian genotype-by-subgroup interaction models that can quantify for any given trait, the extent of heterogeneity between these subgroups that is attributable to genetic variation. We are interested in leveraging longitudinal EHR of patients to incorporate other risk factors (e.g. social determinants of health, lifestyle-related factors such as chronic stress, multimorbidities) in driving differences in patterns of cognitive decline between these subgroups.

PREDICTING RISK OF DISEASE FOR PRECISION MEDICINE

Our ultimate goal is to be able to devise scalable models that are tailored for a specific individual given their genetic profile, sex, lifestyle, previous medical history and environmental conditions. We have previously developed Bayesian generalized additive models to predict risk of breast cancer wherein we incorporated interactions between multiple omics such as gene expression and copy number variation. We seek to develop novel models that incorporate multiple layers of data to predict risk of cognitive decline.

RESEARCH OVERVIEW

Join Our Team

INTEGRATING MULTI-MODAL DATA WITH ELECTRONIC HEALTH RECORDS

UNDERSTANDING DIFFERENCES IN DISEASE RISK BETWEEN POPULATION SUBGROUPS

PREDICTING RISK OF DISEASE FOR PRECISION MEDICINE

This website uses cookies.