The ab initio protein folding algorithm Rosetta carries out a simulated annealing algorithm to search through the conformation space of three dimensional structures. From the protein structure database fragment libraries for three and nine residue segments of the chain are generated, utilizing a sequence profile comparison method. A move set in the annealing algorithm is defined by substituting local segments in the chain with fragments from this library. One part of the scoring function used in the annealing procedure favors the assembly of strands into sheets. However, it only governs how many sheets will be formed given the number of strands, but does not influence how those strands get arranged in the sheets. After generating many decoys using Rosetta, we found that the folding algorithm predominantly generates very local sheets. With the intent to correct the observed biases towards local structures in Rosetta populations, we analyzed the three-dimensional structures in the currently available database of non-homologous proteins, and estimated the distributions of sheet motifs with two edge strands (open sheets) for various sizes in native structures.
There is an extensive literature available about previous studies of
beta-sheet architecture. However, we carried out our own analysis
since we were interested in modeling the distribution of sheet motifs
explicitly, conditioning on other known variables such as loop lengths
between strands in the sheets and the proportion of helix residues in
the structures. In addition, several hundred structures have been
added to the database in recent years, which we use in our analysis.
In our
manuscript we describe the derivation and utilization of a new
scoring function for beta-sheet structures, which incorporates many of
the insights from previous studies in a manner appropriate for
evaluating Rosetta models. This webpage also allows you to search the database for beta-sheet
motifs of your interest, and to score the
sheets according to our model, which assigns a likelihood to
each motif, given the loop lengths between strands in the sheets and
the proportion of helix residues in the structures.
This project was carried out in the Baker Lab in the Department of
Biochemistry at the University of Washington, as a collaboration with
Charles Kooperberg, Richard Bonneau, and David Baker. Many thanks also
to Jerry Tsai and Brian Kuhlman.