Our research aims to develop computational approaches to gain biological or clinical insights into human disease. Using genetic variation as a foundation, in particular “high impact” variants from large-scale sequencing studies, we hope to shed insight into genes, disease processes, and experimental manipulations that could one day lead to improved treatments or patient care.

Association studies for complex disease

To identify novel loci or genes associated with type 2 diabetes and other complex diseases, we participate in the analysis of  genetic association studies, particularly those that use next-generation sequencing, Via new methods for genetic architecture simulation, we have shown that the genetic basis of type 2 diabetes is due predominantly to common variants.

Genetic data sharing and collaborative research

To make data from large-scale genetic studies broadly accessible, we develop means to integrate, analyze, and publish information within human genetic datasets. Our main effort is a public knowledge portal to query these data, with additional work on secure and privacy-preserving data sharing among globally disparate data providers.

High impact variants

To identify molecular “handles” that might provide insight into molecular, cellular, or physiological disease processes, we analyze “high impact” coding variants from large-scale exome sequencing studies. This work has identified a series of protective loss-of-function variants in SLC30A8, suggesting that inhibition of its protein product, ZnT8, may prevent type 2 diabetes.

Links between common and rare forms of disease

To potentially gain insights into common disease from our comparative better understanding of monogenic disease, we analyze mutations in monogenic disease genes for their effects in large populations. This work has demonstrated the limits of diabetes risk prediction from next-generation sequencing data, as well as evidence for a continuum of effectsbetween Maturity Onset Diabetes of the Young (MODY) and type 2 diabetes.

Models of disease processes, perturbations, and model systems

To speed biological or clinical translation of data and knowledge currently spread across different biomedical research communities, we research new means to model and link heterogeneous experimental approaches. This work includes several ongoing efforts to use high-throughput genomic datasets to inform analysis of “high impact” coding mutations, as well as a more ambitious project to pilot a biomedical translator.

Algorithms for data integration

To provide the core algorithmic machinery for computational approaches to understand human disease, we research mathematical approaches to data integration. This work has included methods for integration of sequencing and genotyping assays, as well as prior work on machine learning approaches for protein interaction network alignment.