ChIPXpress

ChIPXpress is a R package designed to improve ChIP-seq and ChIP-chip target gene ranking using publicly available gene expression data.

This page will help you install and run a simple example using ChIPXpress. Note, ChIPXpress has already been submitted and accepted by Bioconductor and will be released in version 2.11 of Bioconductor.

WARNING: Users are required to use the 64-bit version of R to run ChIPXpress if working with the pre-built human GPL570 compendium. This is because the bigmemory package requires 64-bit R for large databases (>2GB). We are working to see if there is a possible alternative for 32-bit R users. If you only wish to work with the pre-built mouse GPL1261 compendium, you can still use 32-bit R to run ChIPXpress.

Questions can be emailed to:   gewu AT jhsph POINT edu

INSTALLATION

The instructions below will work to install ChIPXpress IF you have installed the latest release of R. The current release version of R is 2.15 and the current released version of Bioconductor is 2.10.

[Preliminary Step] If you do not have the latest relase of R (v2.15) installed on your computer, please first download and install R from here.
Choose the version of R appropriate for your operating system.
If you have R version 2.14 or lower, ChIPXpress may not install properly.

If Bioconductor 2.11 is not yet released

[Step 1] Open R.

[Step 2] Change biocLite to download from the devel version of Bioconductor.
      source("http://bioconductor.org/biocLite.R")
      useDevel(TRUE)
[Step 3] Install the ChIPXpress and ChIPXpressData package by typing in R:
      biocLite("ChIPXpress")
  Please be patient, the installation process may be take a while since the ChIPXPressData package contains two large compendiums of gene expression data.

If Bioconductor 2.11 is released,

[Step 1] Open R.

[Step 2] Install the ChIPXpress and ChIPXpressData package by typing in R:
      source("http://bioconductor.org/biocLite.R")
      biocLite("ChIPXpress")

Congratulations, you have successfully installed ChIPXpress!
If the instructions above did not work, and you would prefer to install ChIPXpress manually, scroll down to the bottom for manual installation instructions.

Now you are ready for an introduction to ChIPXpress and a quick example.



EXAMPLES

[ChIPXpress Example 1] The following instructions are directly taken from the introduction pdf in section 3 titled ChIPXpress Example.

Here, we illustrate an example of how to use the ChIPXpress function to produce functional TF target gene rankings. Suppose we are interested in studying Oct4 regulation in mouse embryonic stem cells (ESCs). First, we process the ChIP-seq data using CisGenome (or other method) to obtain a list of predicted Oct4-bound target genes in ESCs. This has already been done previously and is stored as a dataset in the package ready for input into the ChIPXpress function.
For the below, the following italicized commands should be typed in R:
      library(ChIPXpress)
      data(Oct4ESC_ChIPgenes)

Next, we need to load the pre-built mouse database of gene expression profiles from the GPL1261 platform by loading the ChIPXpressData package. Remember, since the database is stored in big.matrix format, we need to use the functions specially designed to work with big.matrixes. This requires installing and loading the bigmemory package.
      library(ChIPXpressData)
      library(bigmemory)
      path <- system.file("extdata", package="ChIPXpressData")
      DB_GPL1261 <- attach.big.matrix("DB_GPL1261.bigmemory.desc", path=path)

To be more clear on exactly what we just did, we first located the path in which the DB GPL1261 database is stored - which would be in the extdata folder of the installed ChIPXpressData package - and then specified the file name and the path to load the DB GPL1261 database. To load the DB GPL570 database for human data, we would simply replace DB GPL1261 with DB GPL570.

We are now ready to run the ChIPXpress function. We specify the Entrez GeneID of the TF-of-interest (18999 is the Entrez GeneID of Oct4), the vector of Oct4 bound genes, and the database:
      Output <- ChIPXpress(TFID="18999",ChIP=Oct4ESC_ChIPgenes$EntrezID,DB=DB_GPL1261)
      head(Output[[1]])
        18999 17865 381591 22702 22271 99377
         5.3   6.1   15.8   20.8   22.0   26.0

      head(Output[[2]])
        [1] "338369" "238555" "257963" "242860" "212569" "243881"
The output is a list of size two. The first item in the list is the Oct4 target gene rankings, where the names of the vector correspond to the Entrez GeneID of each gene and each individual value is the ChIPXpress score of each gene in the database. The second item reports the TF bound genes that were not found in the database (i.e. not measured by the microarray platform).

For the final step, you can convert the Output into a clean table with genes names or any other preferred gene identifier by using any of your favorite annotation packages (e.g., biomaRt). Here, we can use the original Oct4ESC_ChIPgenes dataframe to do so directly.
      GeneNames <- Oct4ESC_ChIPgenes$Annotation[match(names(Output[[1]]),Oct4ESC_ChIPgenes$EntrezID)]
      Result <- data.frame(1:length(Output[[1]]),GeneNames,names(Output[[1]]),Output[[1]])
      colnames(Result) <- c("Rank","GeneNames","EntrezID","ChIPXpressScore")
      head(Result)

Good job! You have just completed a ChIPXpress analysis of real Oct4 ChIP-seq data.
Result contains the final ChIPXpress rankings.


[ChIPXpress Example 2]

For users who are less familiar with R, this example shows how to use read.delim to read in a tab-delimited file containing the list of predicted TF-bound genes. This file contains the peak detection output from CisGenome, where each peak is assigned to a corresponding gene by Entrez GeneID and sorted from the largest to smallest peak signal. Also, only the highest-ranked peak for each gene is retained in this input file.

First, download the tab-delimited file containing the analyzed Oct4 ChIPx data results. Next, to read in the tab-delimited file in R, we type in R:
      Oct4ESC_ChIPgenes <- read.delim(".../Oct4ESC_ChIPgenesEX.txt")
Here, "..." corresponds to the file path that you saved the Oct4 example file.

Now, you are once again ready to follow the ChIPXpress analysis steps from Example 1. Load in the ChIPXpress package by typing in R:
      library(ChIPXpress)

Next, we need to load the pre-built mouse database of gene expression profiles from the GPL1261 platform by loading the ChIPXpressData package. Remember, since the database is stored in big.matrix format, we need to use the functions specially designed to work with big.matrixes. This requires installing and loading the bigmemory package.
      library(ChIPXpressData)
      library(bigmemory)
      path <- system.file("extdata", package="ChIPXpressData")
      DB_GPL1261 <- attach.big.matrix("DB_GPL1261.bigmemory.desc", path=path)

To be more clear on exactly what we just did, we first located the path in which the DB GPL1261 database is stored - which would be in the extdata folder of the installed ChIPXpressData package - and then specified the file name and the path to load the DB GPL1261 database. To load the DB GPL570 database for human data, we would simply replace DB GPL1261 with DB GPL570.

We are now ready to run the ChIPXpress function. We specify the Entrez GeneID of the TF-of-interest (18999 is the Entrez GeneID of Oct4), the vector of Oct4 bound genes, and the database:
      Output <- ChIPXpress(TFID="18999",ChIP=Oct4ESC_ChIPgenes$EntrezID,DB=DB_GPL1261)
      head(Output[[1]])
        18999 17865 381591 22702 22271 99377
         5.3   6.1   15.8   20.8   22.0   26.0

      head(Output[[2]])
        [1] "338369" "238555" "257963" "242860" "212569" "243881"
The output is a list of size two. The first item in the list is the Oct4 target gene rankings, where the names of the vector correspond to the Entrez GeneID of each gene and each individual value is the ChIPXpress score of each gene in the database. The second item reports the TF bound genes that were not found in the database (i.e. not measured by the microarray platform).

For the final step, you can convert the Output into a clean table with genes names or any other preferred gene identifier by using any of your favorite annotation packages (e.g., biomaRt). Here, we can use the original Oct4ESC_ChIPgenes dataframe to do so directly.
      GeneNames <- Oct4ESC_ChIPgenes$Annotation[match(names(Output[[1]]),Oct4ESC_ChIPgenes$EntrezID)]
      Result <- data.frame(1:length(Output[[1]]),GeneNames,names(Output[[1]]),Output[[1]])
      colnames(Result) <- c("Rank","GeneNames","EntrezID","ChIPXpressScore")
      head(Result)

You can also save the output into a tab-delimited file by typing in R:
      write.table(Result,file=".../Oct4_Output.txt",row.names=FALSE,sep="\t",quote=FALSE)
  where ... is your path to where you want to save the file.

Good job! You have just completed a ChIPXpress analysis of real Oct4 ChIP-seq data.
Oct4_Output.txt contains the final ChIPXpress rankings.



MANUAL INSTALLATION

As a last resort, IF the above instructions DO NOT work, and you need to install ChIPXpress manually:

[Step 1] Download the ChIPXpress package from here

[Step 2] Download the ChIPXpressData package from here

[Step 3a] Open R.

[Step 3b] If you do not have the following Bioconductor packages that ChIPXpress imports from, please install them by typing in R:
      source("http://bioconductor.org/biocLite.R")
      biocLite(c("affy","frma","GEOquery"))

[Step 3c] If you do not have the following CRAN packages that ChIPXpress imports from, please install them by typing in R:
      install.packages(c("bigmemory","biganalytics"))

[Step 4] Install the ChIPXpressData package by typing in R:
      install.packages(".../ChIPXpressData_0.99.0.tar.gz",repos=NULL,type="source")
      where ... is the path to the location of the package (where you downloaded the package to).
      For linux, you can install by typing in the linux command line R CMD INSTALL .../ChIPXpressData_0.99.0.tar.gz .
*Note, the ChIPXpressData package is rather large and may take a long time to download and install since it contains thousands of gene expression profiles.

[Step 5] Install the ChIPXpress package by typing in R:
      install.packages(".../ChIPXpress_0.99.5.tar.gz",repos=NULL,type="source")
      where ... is the path to the location of the package.
      For linux, you can install by typing in the linux command line R CMD INSTALL .../ChIPXpress_0.99.6.tar.gz .


The Bioconductor webpages for ChIPXpress can be found here and ChIPXressData can be found here.