Beta Sheet Motifs - Introduction

The ab initio protein folding algorithm Rosetta carries out a simulated annealing algorithm to search through the conformation space of three dimensional structures. From the protein structure database fragment libraries for three and nine residue segments of the chain are generated, utilizing a sequence profile comparison method. A move set in the annealing algorithm is defined by substituting local segments in the chain with fragments from this library. One part of the scoring function used in the annealing procedure favors the assembly of strands into sheets. However, it only governs how many sheets will be formed given the number of strands, but does not influence how those strands get arranged in the sheets. After generating many decoys using Rosetta, we found that the folding algorithm predominantly generates very local sheets. With the intent to correct the observed biases towards local structures in Rosetta populations, we analyzed the three-dimensional structures in the currently available database of non-homologous proteins, and estimated the distributions of sheet motifs with two edge strands (open sheets) for various sizes in native structures.

There is an extensive literature available about previous studies of beta-sheet architecture. However, we carried out our own analysis since we were interested in modeling the distribution of sheet motifs explicitly, conditioning on other known variables such as loop lengths between strands in the sheets and the proportion of helix residues in the structures. In addition, several hundred structures have been added to the database in recent years, which we use in our analysis. In our manuscript we describe the derivation and utilization of a new scoring function for beta-sheet structures, which incorporates many of the insights from previous studies in a manner appropriate for evaluating Rosetta models. This webpage also allows you to search the database for beta-sheet motifs of your interest, and to score the sheets according to our model, which assigns a likelihood to each motif, given the loop lengths between strands in the sheets and the proportion of helix residues in the structures.

This project was carried out in the Baker Lab in the Department of Biochemistry at the University of Washington, as a collaboration with Charles Kooperberg, Richard Bonneau, and David Baker. Many thanks also to Jerry Tsai and Brian Kuhlman.

For questions please contact Ingo Ruczinski.