Documentation by Vikram K. Mulligan (vmullig@uw.edu), Baker Laboratory. Last modified 4 November 2017.

Description of score term

The repeat stretch score term (aa_repeat) is a wholebody energy that assigns a penalty to a pose based on its sequence. Specifically, it penalizes long stretches in which the same residue type repeats over and over (e.g. poly-Q sequences). This is currently only useful for filtering a design after a packing run, but the ultimate goal is to get it working with a modified form of the packer that permits packing with non-pairwise-decomposable scoring terms that can be computed quickly.

Typical usage

The aa_repeat_energy scoring term can be added to any existing scoring function (e.g. ref2015.wts), and should work "out of the box" with both canonical and noncanonical amino acids. It imposes a penalty for each stretch of repeating amino acids, with the penalty value depending nonlinearly on the length of the repeating stretch. By default, 1- or 2-residue stretches incur no penalty, 3-residue stretches incur a penalty of +1, 4-residue stretches incur a penalty of +10, and 5-residue stretches or longer incur a penalty of +100. Since the term is sequence-based, it is really only useful for design -- that is, it will impose an identical penalty for a fixed-sequence pose, regardless its conformation. This also means that the term has no conformational derivatives: the minimizer ignores it completely. The term is not pairwise-decomposible, but has been made packer-compatible, so it can direct the sequence composition during a packer run.

Controlling the penalty values

The penalty assigned to a stretch of N repeating residues is determined based on a database file. By default, the file used is:

database/scoring/score_functions/aa_repeat_energy/default_repeat_penalty_table.rpt_pen
This is what this file looks like:
# The series of numbers below indicates the penalty for having 1, 2, 3, etc. of the same residue in a row.
# Zero residues (empty poses) are not penalized.  More than the number of residues listed results in the 
# last penalty being imposed.  (For example, in this file, a repeat stretch of more than five residues will
# be given a penalty of 100).
0.0 0.0 1.0 10.0 100.0
The lines starting with a pound sign (#) are ignored. The relevant line is the row of numbers, which represent the penalty for a 1-, 2-, 3-, 4-, or 5-residue stretch. Stretches longer than 5 residues are assigned the 5-residue stretch penalty. The user may provide a custom .rpt_pen file using the -aa_repeat_energy_penalty_file <filename> flag. Custom .rpt_pen files may have as many penalty values as the user wishes (i.e. 5-residue stretches are not the limit). Stretches longer than the longest specified will be assigned the penalty given to the longest specified.

Organization of the code

  • The scoring method lives in core/scoring/methods/AARepeatEnergy.cc and core/scoring/methods/AARepeatEnergy.hh.
  • The whole-body energy is evaluated by the AARepeatEnergy::finalize_total_energy() function, which takes a pose as input.
  • This function calls AARepeatEnergy::calculate_aa_repeat_energy(), which takes a vector of const owning pointers to Residue objects as input and returns a whole-pose energy value. This function can be called by external code.
  • A unit test is located in source/test/core/scoring/methods/AARepeatEnergy.cxxtest.hh. This test first scores the trp cage miniprotein, which has a three-proline repeat sequence. It then adds polyalanine repeat sequences to the end of the trp cage and repeats the scoring, confirming that the expected score value results each time.

See also