Documentation created by Vikram K. Mulligan (firstname.lastname@example.org), Baker laboratory. Last edited 27 December 2017.
This scoring term is intended for use during design, to penalize deviations from a desired residue type composition. For example, a user could specify that the protein was to have no more than 5% alanines, no more than 3 glycines, at least 4 prolines, and be 40% to 50% hydrophobic with 5% aromatics, and that its surface have no more than 50% charged residues. Calculating a score based on residue type composition is easy and fast, but is inherently not pairwise-decomposable. This scoring term is intended to work with Alex Ford's changes to the packer that permit fast-to-calculate but non-pairwise-decomposable scoring terms to be used during packing or design.
The basic idea is that the algorithm does the following:
.compfiles that define the behaviour of this score term (see below).
This scoring term is controlled by
.comp files, which define the desired residue type composition of a protein. The
.comp file(s) to use can be provided to Rosetta in three ways:
.compfiles as input at the command line with the
-aa_composition_setup_file <filename1> <filename2> <filename3> ...flag.
.compfiles when setting up a particular scorefunction in RosettaScripts, using the
<Set>tag to modify the scorefunction. For example:
<SCOREFXNS> <ScoreFunction name="tala" weights="talaris2014.wts" > <Reweight scoretype="aa_composition" weight="1.0" /> <Set aa_composition_setup_file="inputs/disfavour_polyala.comp" /> </ScoreFunction> </SCOREFXNS>
.compfiles to a Pose with the AddCompositionConstraintMover. These remain attached to the pose, like any other constraint, until all constraints are cleared with the ClearConstraintsMover, or until only sequence composition constraints are cleared with the ClearCompositionConstraintsMover. Note that the composition constraints added with the AddCompositionConstraintMover can have a ResidueSelector attached to them as well. This allows the user to define sub-regions of the pose (e.g. a single helix, the protein core, an inter-subunit binding interface) to which an amino acid composition constraint will be applied. The ResidueSelector is evaluated prior to scoring or packing (but not evaluated repeatedly during packing). Here is an example RosettaScript in which a LayerSelector is used to select the core of the protein, and the AddCompositionConstraintMover is used to impose a sequence composition requirement on core residues only:
<ROSETTASCRIPTS> <SCOREFXNS> <ScoreFunction name="tala" weights="talaris2014.wts" > <Reweight scoretype="aa_composition" weight="1.0" /> </ScoreFunction> </SCOREFXNS> <RESIDUE_SELECTORS> <Layer name="corelayer" select_core="true" core_cutoff="0.5" surface_cutoff="0.25" /> </RESIDUE_SELECTORS> <TASKOPERATIONS> </TASKOPERATIONS> <FILTERS> </FILTERS> <MOVERS> <AddCompositionConstraintMover name="addcomp1" filename="desired_core_composition.comp" selector="corelayer" /> <FastDesign name=fdes1 scorefxn="tala" repeats="3" > <MoveMap name="fdes1_mm"> <Span begin="1" end="30" chi="1" bb="0" /> </MoveMap> </FastDesign> </MOVERS> <APPLY_TO_POSE> </APPLY_TO_POSE> <PROTOCOLS> <Add mover="addcomp1" /> <Add mover="fdes1" /> </PROTOCOLS> </ROSETTASCRIPTS>
If the user uses more than one of the methods described above, all of the
.comp files provided will be used in scoring, provided the
aa_composition scoreterm is on with a nonzero weight.
.comp file consists of one or more
PENALTY_DEFINITION blocks. Lines that can be present in a
PENALTY_DEFINITION block include:
PENALTY_DEFINITIONStarts the block.
TYPE <restype1> <restype2> <restype3> ...Indicates that a residue should be counted if its three-letter code matches ANY of the names provided.
NOT_TYPE <restype1> <restype2> <restype3> ...Indicates that a residue should NOT be counted if its three-letter code matches ANY of the names provided. The residue is not counted even if it matches properties listed in
PROPERTIES <property1> <property2> <property3> ...Indicates that a residue should be counted if it has ALL of the properties listed. Note that a list of currently valid residue properties is autogenerated here.
OR_PROPERTIES <property1> <property2> <property3> ...Indicates that a residue should be counted if it has ANY of the properties listed.
NOT_PROPERTIES <property1> <property2> <property3> ...Indicates that a residue should NOT be counted if it has ANY of the properties listed.
DELTA_START <integer>This indicates how far from the desired number of residues our penalties table extends. For example, a value of '-5' means that we will be providing penalty values for up to five residues fewer than the desired number.
DELTA_END <integer>This indicates how far beyond the desired number of residues our penalties table extends. For example, a value of '7' means that we will be providing penalty values for up to seven residues more than the desired number.
FRACT_DELTA_START <float>New 28-Apr-2016 This is an alternative to
DELTA_STARTthat indicates how far below the desired fractional composition our penalties table extends. For example, a value of '-0.05' means that we will be providing penalty values for up to 5% fewer than the desired number. (So if we specified a desired fractional composition of 0.07 (7%) for alanine, then
FRACT_DELTA_START -0.05indicates that the penalties table provides penalties down to a 2% alanine.) Either
FRACT_DELTA_STARTmust be used.
FRACT_DELTA_END <float>New 28-Apr-2016 This is an alternative to
DELTA_ENDthat indicates how far above the desired fractional composition our penalties table extends. For example, a value of '0.08' means that we will be providing penalty values for up to 8% more than the desired number. (So if we specified a desired fractional composition of 0.03 (3%) for valine, then
FRACT_DELTA_END 0.08indicates that the penalties table provides penalties up to a 11% valine.) Either
FRACT_DELTA_ENDmust be used.
PENALTIES <float1> <float2> <float3> ...The actual penalties table. If
DELTA_ENDwere used, then entries must be provided for every integer value from DELTA_START to DELTA_END. These values represent the energetic penalty for having N residues too few, N+1 residues too few, N+2 residues too few ... M-1 residues too many, M residues too many. If
FRACT_DELTA_ENDare used, then any number of penalty values may be specified; they will be linearly interpolated within the range [FRACT_DELTA_START, FRACT_DELTA_END]. In either case, the end functions are applied if residue type counts fall outside of the range.
FRACTION <float>This indicates that this residue type, or residues with the defined properties, are ideally this fraction of the total. For example, a value of 0.25 would mean that, ideally, a quarter of residues in the protein were those defined by this
PENALTY_DEFINITION. If a ResidueSelector was used when applying a composition constraint to a pose, the fraction represents the portion of selected residues (e.g. 50% of core residues, 10% of residues in helix 3, 40% of residues in the binding interface). Otherwise, it represents the fraction of total residues in the pose.
ABSOLUTE <integer>An alternative to
FRACTION, this indicates the absolute number of residues of the given type or properties desired in the structure. For example, a value of 3 would mean that we want 3 residues of the given type or properties.
AFTER_FUNCTION <string>This defines the behaviour of the penalty function outside of the user-defined range. Allowed values are CONSTANT (first or last value repeats), LINEAR (linearly-ramping penalty based on the slope of the first two or last two penalty values), or QUADRATIC (parabolic penalty centred on zero and passing through the first two or last two penalty values).
END_PENALTY_DEFINITIONEnds the block.
END_PENALTY_DEFINITION lines are always required. The
AFTER_FUNCTION lines are optional, and default to QUADRATIC if not specified. One
FRACTION or one
ABSOLUTE line must also be present (but not both).
FRACT_DELTA_END, lines are required. The
NOT_PROPERTIES lines are all optional, and can be used in conjunction with one another. The logic for deciding whether to count a residue or not is as follows:
Count if ( any TYPE matches ) OR ( ( no NOT_TYPE matches ) AND ( ( no NOT_PROPERTIES property is present) AND ( (no PROPERTIES or OR_PROPERTIES are defined) OR ( all PROPERTIES are present) OR ( any OR_PROPERTIES are present ) ) ) ).
Here's an example
.comp file that penalizes deviations from having 10% aromatic residues in a protein (note that the pound sign can be used to comment one of these files):
# This is a .comp file for requiring that a structure be ten percent aromatic. # File created 21 July 2015 by Vikram K. Mulligan (email@example.com), Baker laboratory. # This penalty definition block specifies that, for aromatics, there will be a 100-point penalty for # having ANY fewer or ANY more than the desired number of aromatic residues. PENALTY_DEFINITION # Define residue types to control PROPERTIES AROMATIC NOT_PROPERTIES POLAR CHARGED # Declare desired quantity of these residues FRACTION 0.1 # Set the penalty for having too few, at the desired number, and too many of the specified residues PENALTIES 100 0 100 # Set how many residues you can be below the desired quantity before a penalty is applied. Since # this value is a delta, the desired number of residues is "0", or zero residues away from the target. # Therefore, "-1" indicates that the penalty will be applied once there is one fewer than the # desired quantity DELTA_START -1 # Set how many residues you can be above the desired quantity before a penalty is applied. Since # this value is a delta, the desired number of residues is "0", or zero residues away from the target. # Therefore, "1" indicates that the penalty will be applied once there is one more than the # desired quantity DELTA_END 1 #set how the penalties are applied BEFORE_FUNCTION CONSTANT AFTER_FUNCTION CONSTANT END_PENALTY_DEFINITION
Here's a more complicated .comp file that imposes the requirement that the protein have 40% aliphatic or aromatic residues other than leucine (i.e. ALA, PHE, ILE, MET, PRO, VAL, TRP, or TYR), and 5% leucines:
# This is a .comp file for requiring that a structure be ten percent aromatic. # File created 21 July 2015 by Vikram K. Mulligan (firstname.lastname@example.org), Baker laboratory. PENALTY_DEFINITION OR_PROPERTIES AROMATIC ALIPHATIC NOT_TYPE LEU FRACT_DELTA_START -0.05 FRACT_DELTA_END 0.05 PENALTIES 100 0 100 # The above two lines mean that if we're 5% below or 5% above the desired content, we get a 100-point penalty. FRACTION 0.4 # Forty percent aromatic or aliphatic, but not leucine BEFORE_FUNCTION CONSTANT AFTER_FUNCTION CONSTANT END_PENALTY_DEFINITION PENALTY_DEFINITION TYPE LEU DELTA_START -1 DELTA_END 1 PENALTIES 100 0 100 FRACTION 0.05 # Five percent leucine BEFORE_FUNCTION CONSTANT AFTER_FUNCTION CONSTANT END_PENALTY_DEFINITION
As of 6 March 2016, the aa_composition score term should be fully compatible with symmetry, including mirror symmetry. Note that it counts all residues in the pose or selection, not only those in the asymmetric unit. In poses with mirror symmetry, it is properly aware of inverted types in mirrored subunits.
finalize_total_energy()function that takes a pose. This calculates the score. Internally, it calls
calculate_aa_composition_energy(), which takes a vector of owning pointers to Residues (which can be called directly during packing).
core/scoring/aa_composition_energy/AACompositionEnergySetup.hh. AACompositionEnergySetup objects can also be stored in AACompositionConstraints associated with a Pose. At scoring or packing time, the AACompositionEnergy constructs a vector of owning pointers to its internal AACompositionEnergySetup objects and to all those stored in the pose, and uses all of these for scoring.
.compfiles are located in