Course description:  An algorithmic approach to modern biological sequence analysis. Provides an overview of the core algorithms and statistical principles of bioinformatics; topics covered will include detailed probability and molecular biology background, sequence alignment (local, global, pairwise and multiple), hidden Markov Models (as powerful tools for sequence analysis), gene finding, and phylogenetic trees. Topics will be covered from an algorithmic perspective although no prior programming experience is required. Basic probability and molecular biology will be covered in enough detail so that no prior probability or advanced biology classes are required.

Grading:  Homework 70%, presentation plus written critique 30%. Homework will be docked 25% if turned in by the next Friday, and will not be accepted after that. The presentation may be based on a publication or a project (see here for details) and will include a short oral as well as a written component.

Course learning objectives:  The general goal of the course is to provide students with an in-depth understanding of the algorithms and modeling ideas behind common tools used in genomic sequences research. Students are expected to develop the ability to independently construct models to address specific biological questions, and to independently carry out analyses and interpret the results. No prior programming experience is necessary -- students may construct models and algorithms in pseudocode (methodical description, in words, of the series of steps that a program would follow). The specific goals are:
1) Understand concepts in basic molecular biology and probability;
2) Be familiar with classic and modern pairwise alignment algorithms, including BLAST;
3) Understand the statistical significance of alignment scores and the interpretation of alignment algorithm output;
4) Understand the mechanism and the use of dynamic programming;
5) Be familiar with multiple alignment;
6) Understand the different assumptions about evolution made by different models and algorithms;
7) Understand the likelihood approach to phylogenetic reconstruction, and multiple alignment as applied to phylogenetic tree construction;
8) Understand Markov models and hidden Markov models (HMM) in the genomic context, and essential algorithms for analyzing HMMs;
9) Understand HMMs as applied to gene finding. Be familiar with other algorithms in gene-finding;
10) Identify from the literature important algorithmic/statistical advances in bioinformatics, and prepare an oral presentation of a recent bioinformatics publication that is important from either a biological or a mathematical perspective.

Prerequisites:  none. Contact me (swheelan@jhmi.edu) if you are unsure about taking this course, otherwise, all are welcome!