Simina Maria Boca · Capstone Project for the MHS in Bioinformatics · Adviser: Jeffrey T. Leek

Gene-set analysis
Gene-set analysis refers to methods of integrating data across gene-level measurements, to infer associations between biological processes or pathways (gene-sets) and outcomes or phenotypes. Sets are defined by previous experimentally verified or postulated relationships between between genes, and can often be found in large biological databases such as GO or KEGG. The inference is thus performed at the level of sets, not individual genes, and aims to provide more interpretable and biologically-relevant results.
A decision-theoretic method for gene-set analysis
We introduced a decision-theoretic method for gene-set analysis. Our approach focuses on estimating the fraction of non-null genes in a gene-set and is easier to interpret than traditional methods which use hypothesis testing and p-values. We also developed relevant software, in the form of an R package, for the implementation of our method. The R package for Linux and relevant methods paper are:
Set [download .tar.gz] | A decision-theory approach to interpretable set analysis for high-dimensional data.

The relevant vignette for the package is:
Set vignette [pdf]
Somatic cancer mutation studies of cancer
Somatic mutations present in the coding exons of a variety of solid human tumors have been discovered. We created a database for the mutations from the genomic studies of Sjoblom et al. (2006), Wood et al. (2007), Jones et al. (2008), Parsons et al. (2008), and Parsons et al. (2010), dealing with breast tumors, colorectal tumors, pancreatic tumors, glioblastomas multiforme, and medulloblastomas. A form which allows users to query the database of somatic cancer mutations by gene name, tumor type, or sample name is available at:
Cancer mutation information
Analysis of somatic cancer mutation data
We developed relevant software for the analysis of somatic cancer mutation data at the gene level and at the gene-set level. The gene-set analysis method is patient-oriented, initially scoring each gene-set at the patient rather than the gene level. Most gene-set analysis approaches are gene-oriented, initially obtaining scores for each gene across all patient-samples, and are as a result not suitable for somatic cancer mutation studies. The R package for Linux for the gene-level analysis and the relevant methods paper are:
CancerMutationAnalysis [download .tar.gz] | Design and analysis issues in genome-wide somatic mutation studies of cancer

The R package for the gene-set analysis (including a vignette) and the relevant methods paper are:
PatientGeneSets | Patient-oriented gene set analysis for cancer mutation data