Welcome to 140.638.01 Analysis of Biological Sequences!

Why take this course?

With fast-growing technology we can determine the sequences of biological molecules like DNA, RNA, and protein, and can produce tremendous amounts of data. Understanding these sequence data requires hybrid biological, mathematical, and computational expertise. This course presents algorithms and methods for working with and thinking about biological sequences, providing the first steps toward proficiency in this growing field.

Course description:

Presents a variety of methods for assigning function to biological sequences, emphasizing biologically informed algorithm design. Covers a variety of topics, including low- and high-throughput sequencing history and methods; multiple classes of sequence alignment problems (one-to-one, multiple alignment, alignment of a few sequences to a database, and alignment to a reference genome); interpreting sequence alignments; discovery of patterns in sequences; and visualizing data.

Learning objectives:

Upon successfully completing this course, students will be able to describe the algorithms used in assigning function to biological sequences, determine which methods are appropriate for analyzing sequences derived from different experiments, and design analysis pipelines that are biologically meaningful and mathematically rigorous.

Detailed course outline

Homework links and information

Course times: Tuesday and Thursday, 8:30-9:50AM

Classroom: The course will be taught over Zoom this year. Please contact me at swheelan@jhmi.edu for the link.

Instructor: Sarah Wheelan


Homework (3-4 assignments) 80%, final project (written critique of a publication) 20%. Late homework is not accepted except by prior agreement.


The course is taught primarily from notes, and no single textbook is required, but some textbooks are helpful as references. Feel free to consult me before buying any books.

Pavel Pevzner's books provide a clever biological impetus and readable descriptions of the relevant algorithms
Biological Sequence Analysis by Durbin et. al. is a classic, though mathematically sophisticated
Bioinformatics and Functional Genomics by Jonathan Pevsner gives a strong biological motivation for the tools used in computational biology and a good description of the algorithmic basis of those tools
Statistical Methods in Bioinformatics by Ewens and Grant is a good start for those with biological backgrounds
Bioinformatics, Sequence and Genome Analysis by Mount is also fairly basic, though solid, in its descriptions of algorithms

The class will also be taught in part from current literature.