Research Overview

I am interested in developing statistical methods and software for technologies that characterize diseases and complex traits from DNA and RNA measured on a genomic scale. Two technologies that provide high throughput measurements of genomic information are the focus of my current research: single nucleotide polymorphism (SNP) and gene expression arrays. For SNP arrays, I am developing statistical methods to estimate copy number at a single locus, and to infer regions of copy number variants (CNV) spanning multiple loci. In addition, I am developing methods to assess the association of CNV with phenotypes in genome-wide association studies (GWAS). For gene expression data, I am developing methods for the cross-study analysis of expression data to identify concordant and discordant patterns of differential gene expression.

The software developed for each of the methods below facilitate reproducible research, unify the steps in complex analyses of genomic data, and are freely available under the GNU General Public Licence of the Free Software Foundation.

  • crlmm : Locus-level estimation of copy number for Affymetrix whole genome genotyping arrays.
  • VanillaICE: Hidden Markov model for identifying alterations in chromosomal copy number and/or loss of heterozygosity.
  • A joint Hidden Markov model for identifying de-novo and inherited chromosomal alterations in trios.
  • SNPchip : Visualization tools for high-throughput estimates of copy number and genotype.
  • XDE : A Bayesian hierarchial model for identifying concordant and discordant differential gene expression in multiple studies.