Hongkai’s Research Group

ChIP-seq and ChIP-chip are high-throughput technologies to map genome-wide transcription factor binding sites and histone modifications. They are widely used to decode gene regulatory programs in development and diseases. We develop analytical and software tools for analyzing ChIP-seq and ChIP-chip data. Examples of our tools include CisGenome, dPCA, TileMap, TileProbe, and JAMIE. We have also developed a database, hmChIP, to help scientists to explore publicly available ChIP data.

2. Statistical and Computational Methods and Tools for ChIP-seq and ChIP-chip Data Analysis

Research

6. Decoding Gene Regulation in Stem Cells, Development and Diseases

We are interested in decoding gene regulatory programs in development, stem cells and diseases. We have contributed knowledge on gene regulation in (1) human and mouse embryonic stem cells [1,2], (2) the sonic hedgehog signaling pathway in embryonic development [3,4,5],  (3) B cell lymphoma [1], leukemia [6], and various other cancers [7]. Most recently, we have started to develop systematic methods to predict genome-wide regulatory programs in development and diseases.

7. Methods and Tools for New High-throughput Technologies

We also develop methods and tools for analyzing data from novel high-throughput technologies. One example is methods for analyzing TIP-chip data to identify active retrotransposon elements in human genomes.

We develop tools for analyzing large scale gene expression data. One example is the correlation motif approach, CorMotif, for integrative analysis of multiple gene expression experiments. Most recently, we are interested in developing tools for mining the large amounts of gene expression data in gene expression omnibus (GEO). This involves addressing challenges in the analysis of ultra high-dimensional heterogeneous datasets, text mining, and integration of text, gene expression and other types of ‘omics data.  

3. Methods and Tools  for Analyzing Large Scale Gene Expression Data

Integrative analysis of multiple heterogeneous ‘omics datasets can lead to new discoveries. Often, it can also significantly increase the statistical power for detecting weak signals in each individual dataset.  Data integration and data mining are not trivial. In addition to the high dimensionality and inherent heterogeneity of the data, the number of possible combinatorial patterns in the data grows exponentially. We develop tools for data integration and mining that tackle these challenges. Examples include iASeq for integrative analysis of allele-specificity, JAMIE for joint analysis of multiple ChIP-chip datasets, CorMotif for joint analysis of multiple gene expression datasets, ChIP-PED for joint analysis of ChIP and public gene expression data.

5. Statistical Methods for Data Integration and Data Mining

4. Methods and Tools for DNA Sequence Motif Analysis

We develop tools for finding novel DNA sequence motifs, mapping known motifs to genome sequences [1,2], as well as combining the motif information with various chromatin signals to predict transcription factor binding sites [3].

Single-cell genomic technologies such as single-cell RNA-seq, single-cell ATAC-seq and single-cell ChIP-seq provide unprecedented power for examining the functional genomic landscape of a heterogeneous cell population. We develop statistical and computational methods and tools for designing single-cell genomic experiments and analyzing single-cell genomic data. Examples of our tools include TSCAN and SCRAT.  

1. Analytical Methods and Tools for Single Cell Genomics