Data Coverage

For ChIP-chip, we have downloaded and processed 76% (48 out of 63) of the GEO data series for transcription factors and histone modifications generated using Affymetrix tiling arrays. Note that one GEO data series can have multiple experiments for different proteins and therefore can generate multiple peak lists. Sometimes, two GEO series may correspond to one study (e.g., one series may be a subset of another series). In these cases, redundant data from the two series were removed and only one copy of the data was retained. In some other cases, the peak lists at the 10% FDR level were empty. In those cases, the experiments were removed from the final hmChIP database. For these reasons, the unique GSE accession numbers that appear in hmChIP are fewer than 48. Our data represent a relatively comprehensive collection of Affymetrix ChIP-chip data. Currently we haven’t achieved 100% coverage of all public data since data collection and processing takes time, and new data sets are continually submitted to GEO.
For ChIP-seq, we have analyzed data from ENCODE and SRA/GEO (note: most GEO ChIP-seq data sets have corresponding SRA accession numbers, and vice versa). Our current hmChIP covers >70% of the TF and histone data released by ENCODE for public use. For SRA and GEO, currently only the data that satisfy the following criteria are included in hmChIP: (1) there is a published paper to support the data so that we can understand the experimental design and, if necessary, the biology behind the data; (2) the data must have control samples. There are about 250 human and mouse ChIP-seq studies in SRA and GEO. At the time of writing, we have processed 70 of them. About 60% of our processed studies satisfied the criteria above. Among the studies that satisfied the criteria, some experiments produced empty peak lists. These experiments were further removed. The data available in hmChIP represent the data that remain after processing. We are still working on analyzing more data from SRA/GEO. As we continue to collect more data, hmChIP is expected to grow steadily.

Content :

How To Use:



    Li Chen

    Dept. of Biostatistics

    Johns Hopkins Univ