Hongkai’s Computational biology Group

Welcome to Hongkai Ji’s Research Group

We are interested in developing statistical and computational methods for analyzing big and complex data, particularly high-throughput genomic data. We apply these tools to study gene regulatory programs in development and diseases.

News:

1. BIRD: Our new paper on using gene expression data to predict regulatory element activities is accepted and will appear in Nature Communications. Check out BIRD (big data regression for predicting DNase I hypersensitivity),  PDDB and the bioRxiv Preprint.

 

2. SCRAT: Check out our new software SCRAT and its web server for analyzing single-cell ATAC-seq data. Here is the SCRAT paper in Bioinformatics.

 

3. TSCAN: We are glad to release TSCAN, a computational method and software tool to construct pseudo-temporal paths using single-cell RNA-seq data.

Main Projects, Resources and Tools:

 

 

Openings:

Postdoc and Graduate student research assistant positions are available until filled. If you are interested in these positions, please email your CV and recommendation letters to  hji@jhu.edu.

 

HONGKAI JI, Ph.D.

Associate Professor

Department of Biostatistics

Johns Hopkins Bloomberg School of Public Health

615 North Wolfe Street, Room E3638

Baltimore, MD 21205, USA

Phone: (410) 955-3517

Fax: (410) 955-0958

Email: hji@jhu.edu

(1) CisGenome: integrated software for peak calling, annotation, motif analysis, etc.

(2) BIRD: genome-wide prediction of chromatin accessibility using RNA-seq or exon array data

(3) dPCA: a software tool for analyzing differential binding. It compares the quantitative ChIP-seq signals in multiple ChIP-seq datasets between two biological conditions and considers the variability in replicate samples.

(4) hmChIP: a database of public human and mouse ChIP-seq/ChIP-chip data.

(5) iASeq: an R/bioconductor package for detecting allele-specific binding by jointly analyzing multiple ChIP-seq data sets

(6) PDDB:a database of predicted regulatory element activities based on BIRD

(7) PolyaPeak: a tool for improving ChIP-seq peak calling using peak shape information.

(8) TileMap: a software tool for ChIP-chip peak calling.

(9) TileProbe: a software tool for removing probe effects in Affymetrix tiling array data.

(10) JAMIE: joint analysis of multiple ChIP-chip datasets for improving peak calling.

(11) ChIPXpress: improve target gene ranking using gene expression data in GEO.

2. Statistical and computational tools for ChIP-seq, ChIP-chip, DNase-seq, ATAC-seq:

 

 

(1) BIRD: genome-wide prediction of chromatin accessibility using RNA-seq or exon array data

(2) GSCA: a software tool with graphical user interface for mining publicly available gene expression data. It allows one to systematically identify biological contexts associated with user-specified gene set activity patterns.

(3) CorMotif: an R/bioconductor package for jointly analyzing multiple gene expression datasets to simultaneously detect differentially expression genes and patterns.

(4) ChIP-PED: an R package for discovering regulatory pathway activities in a large compendium of gene expression data from GEO.

(5) PowerExpress: a tool for finding genes with a user-specified pattern of interest from multiple gene expression experiments.

3. Methods and tools for gene expression data analysis:

(1) CisGenome: de novo motif discovery, known motif mapping, motif enrichment analysis based on matched genomic control regions.

4. Tools for sequence motif analysis:

(1) BIRD: genome-wide prediction of chromatin accessibility using gene expression

(2) ChIP-PED: increasing the value of ChIP-seq/ChIP-chip experiments by  expanding discoveries to other cell types using large compendiums of publicly available gene expression data in GEO.

(3) CorMotif: integrative analysis of multiple gene expression experiments.

(4) dPCA: integrative analysis of quantitative ChIP-seq signals in multiple datasets for detecting binding differences between different biological conditions.

(5) GSCA: a software tool with graphical user interface for mining publicly available gene expression data. It allows one to systematically identify biological contexts associated with user-specified gene set activity patterns.

(6) iASeq: integrative analysis of multiple ChIP-seq studies to improve inference of allele specificity.

(7) JAMIE: joint analysis of multiple ChIP-chip datasets for improving peak calling

(8) TileProbe: using publicly available ChIP-chip data in GEO to improve probe effect model in the tiling array data.

5. Statistical methods for ‘omics data integration and data mining:

(1) Analysis tool for TIP-chip: detecting active transposon elements in human genome

6. Data analysis methods and tools for new high-throughput genomic technologies:

(1) Stem cells: roles of MYC [1], Sox17 [2], Gata6 etc. in embryonic stem cells.

(2) Early development: sonic hedgehog signaling pathway in limb bud and neural tube development [3,4,5]

(3) Cancers: B cell lymphoma [1], medulloblastoma [5], leukemia [6], liver cancer

(4) Other diseases: schizophrenia [7], lyme disease

(5) Transcription factors: MYC [1], GLI [3,4,5], Sox17 [2], FoxO [8], Oct4/Sox2 [9], Gata6, KLF9, TCF4

(6) Epigenetics and epigenomics: histone modifications and DNase hypersensitivity [10]

(7) Yeast metabolic cycle

7. Gene regulatory programs in development and diseases:

(1) TSCAN: pseudo-time analysis of single-cell RNA-seq data.

(2) SCRAT: a toolbox for analyzing single-cell regulome data.

 

1. Analytical methods for single-cell genomics: