
| 
   dPCA  | 
 
| 
   dPCA: differential principal component analysis of ChIP-seq  | 
 
| 
   [Introduction] 
 We propose Differential Principal Component Analysis (dPCA) for analyzing multiple ChIP-seq datasets to identify differential protein-DNA interactions between two biological conditions. dPCA integrates unsupervised pattern discovery, dimension reduction, and statistical inference into a single statistical framework. It uses a small number of principal components to concisely summarize the major multi-protein differential patterns between the two conditions. For each pattern, it detects and prioritizes differential genomic loci by comparing the between-condition differences with the within-condition variation among replicate samples. dPCA provides a new tool for efficiently analyzing large amounts of ChIP-seq data to study dynamic changes of gene regulation across different biological conditions. 
 dPCA is part of CisGenome project. Currently, it can be run as a command line program. We will consider incorporating it into CisGenome GUI in the future. 
 [Supporting information for the dPCA paper] 
 Supporting Figures: FigureWeb1.pdf Additional Simulations: dPCA_TechReport_2.pdf 
 
 [News] 
 Several new functions have been released, including: (1) dpca_peakcalls: a program that uses CisGenome peak calling function to eliminate input genomic loci not bound by any protein in any dataset. (2) -r option of dpca: allows one to compute the R^B statistic for identifying differential loci with significant absolute binding. (3) -z option of dpca: allows one to use dPCA-Z to filter out differences without significant binding activities before dPCA. See readme for details. 
 [Download] 
 Software: Windows, Linux, Mac OS 
 Example Data: The data below are normalized read count data used in the dPCA paper. You can run dpca directly using these data without using dpca_importdata. 
 MYC analysis (Example I): Ebox_data; Ebox_peakprob (peak calls for R^B, dPCA-Z) Promoter analysis (Example II): Prom_data; Prom_peakprob (for R^B, dPCA-Z) ASB analysis (Example III): ASB_data; ASB_peakprob (for R^B, dPCA-Z) 
 Example commands for analyzing these data are: 
 (1) Promoter analysis (dPCA-P) > dpca -I Prom_data.txt -d /home/ -o Prom_output -t 1 
 (2) Promoter analysis (dPCA-Z) > dpca -i Prom_data.txt -d /home/ -o PromZ_output -t 1 -z 1 -r Prom_peakprob.txt 
 (3) Promoter analysis (dPCA-P and compute R^B) > dpca -i Prom_data.txt -d /home/ -o PromP_output -t 1 -r Prom_peakprob.txt 
 (4) MYC analysis (dPCA-P) > dpca -i Ebox_data.txt -d /home/ -o Ebox_output -t 1 -cm 0 OR > dpca -i Ebox_data.txt -d /home/ -o Ebox_output -t 1 -cm 0 -r Ebox_peakprob.txt 
 (5) ASB analysis (dPCA-P, paired sample) > dpca -i ASB_data.txt -d /home/ -o ASB_output -t 1 -sm 1 -cm 0 OR > dpca -i ASB_data.txt -d /home/ -o ASB_output -t 1 -sm 1 -cm 0 -r ASB_peakprob.txt 
 [Installation] 
 For Windows: An executable program is provided. To run dPCA, click the start menu of your windows system (typically on the bottom left corner of your screen). Choose ‘Accessories > Run’, type ‘cmd’ and then press Enter. A command window will show up. In this window, enter the folder that contains dPCA, for example, by typing: 
 > cd D:\Users\dPCA\ 
 Now type: > dpca_importdata > dpca > dpca_peakcalls 
 You will be able to see some usage information which indicates that you can start to use dPCA now. 
 For Linux, Mac OS: (dPCA is bundled with CisGenome. You can follow the cisgenome installation procedure to install dPCA. dPCA is written in C language. Before installation, you need to have a C compiler such as g++ or gcc installed on your computer.) 
 1. Unzip using ‘gzip -d *.gz’ (here * is the name of the file you have downloaded) 2. Untar using ‘tar xvf *.tar’ 3. Enter cisgenome folder; 4. compile by typing ‘./makefile’. 5. Now enter the subfolder named ‘bin’ by typing ‘cd bin’. 6. Type ‘ls’, you will find three files named ‘dpca_importdata’, ‘dpca’, and ’dpca_peakcalls’, respectively. 7. Now type 
 > dpca_importdata > dpca > dpca_peakcalls 
 If you installed cisgenome correctly, you will be able to see some usage information after you type these two commands. 
 8. You can now start to use dPCA. 
 [Readme] 
 In order to know how to use dPCA, please read the following readme file. 
 
 Examples and sample parameter files: 
 (1) Basic dPCA (Note: the test data for STEP1 are just toy examples illustrating the data formats and the dpca_importdata function. We keep them small to avoid overloading our web server. The test data for STEP2 is a different dataset. It is the data used in our paper. In real applications, you should use data generated by STEP 1 as input for STEP 2.) 
 STEP1: run dpca_importdata Download the test data and regions here and run the command > dpca_importdata sample_importdata_arg.txt 
 (another more complicated sample file sample_importdata2_arg.txt) 
 STEP2: run dpca Download the data below and run the following command: > dpca -i Ebox_data.txt -d /home/ -o Ebox_output -t 1 
 
 (2) dPCA-P + R^B (Note: the test data for STEP1 and STEP 2 are just toy examples illustrating the data formats. The test data for STEP3 is a different dataset. It is the data used in our paper. In real applications, you should use data generated by STEP 2 as input for STEP 3.) 
 STEP1: run dpca_importdata Download the test data and regions here and run the command > dpca_importdata sample_importdata_arg.txt 
 STEP2: run peak calling > dpca_peakcalls -i sample_peakcall_sampledescription.txt -p sample_peakcall_experimentdesign.txt -d /user/cisgenome/bin 
 Here, -d specifies the folder that contains the cisgenome and dpca executable files. 
 (Here are two more complicated sample parameter files: sample_peakcall_sampledescription2.txt and sample_peakcall_experimentdesign2.txt) 
 STEP3: run dpca Download the data below and run the following command: > dpca -i Ebox_data.txt -d /home/ -o Ebox_output -t 1 -cm 0 -r Ebox_peakprob.txt 
 
 (3) dPCA-Z (Note: the test data for STEP1 and STEP 2 are just toy examples illustrating the data formats. The test data for STEP3 is a different dataset. It is the data used in our paper. In real applications, you should use data generated by STEP 2 as input for STEP 3.) 
 STEP1: run dpca_importdata Download the test data and regions here and run the command > dpca_importdata sample_importdata_arg.txt 
 STEP2: run peak calling > dpca_peakcalls -i sample_peakcall_sampledescription.txt -p sample_peakcall_experimentdesign.txt -d /user/cisgenome/bin 
 Here, -d specifies the folder that contains the cisgenome and dpca executable files. 
 (Here are two more complicated sample parameter files: sample_peakcall_sampledescription2.txt and sample_peakcall_experimentdesign2.txt) 
 STEP3: run dpca Download the data below and run the following command: > dpca -i Ebox_data.txt -d /home/ -o Ebox_output -t 1 -cm 0 -r Ebox_peakprob.txt 
 
 [Contact] 
 Hongkai Ji [hji@jhsph.edu]  |