Re: Multigraft – Rosetta Commons

This topic has 35 replies, 3 voices, and was last updated 12 years ago by Anonymous.

Viewing 4 reply threads

Author

Posts
- December 8, 2011 at 1:24 am #1112
  Anonymous
  Is is possible to graft a continuous site such as P-loop onto a known scaffold instead of searching a new one using Multigraft approach. The p-loop occurs between a helix and a beta-sheet . I want to graft the same onto the surface loops of my protein , maintaining the geometry of those residues that occur in P-loop. Hope my question is clearly understandable .
- December 8, 2011 at 2:01 am #6373
  Anonymous
  http://www.rosettacommons.org/manuals/archive/rosetta3.3_user_guide/app_AnchoredDesign.html
  
  http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0020872
- December 18, 2011 at 8:12 pm #6436
  Anonymous
  NaCo stands for “native compare”, and is enabled by the (otherwise undocumented) -enzdes:compare_native flag. It’s basically the associated value corrected for the native value. It’s most useful when doing design on multiple scaffolds, where the total energy, number of buried unsatisfied hydrogen bonds, etc. will vary widely for different scaffolds. Instead of filtering on a hard cutoff, this allows filtering based on the difference from the starting structure. (e.g. “Don’t add more than 5 buried unsatisfied hydrogen bonds above what was already there to begin with”) If you’re only matching with a single template it’s not as useful, as you can set the cutoffs based on the absolute value.
  
  If you think the ligand and constrained sidechains should be able to fit, the most like cause of not finding any is too stringent of constraints. For example, with your LEU constraint, the torsion_A is only being sampled at -70, -100, and -130 degrees. It’s possible that if the best value is something like -115 degrees, the ligand might not fit into the correct hash bin, and therefore not count as a match. You can either increase the sampling, or increase the size of the hash bins (using the flags -match:euclid_bin_size and -match:euler_bin_size), or both. I’d recommend doing so in small steps, as it’s easy to greatly increase the runtime as the combinatorial sampling goes way up.
  
  I also notice that you’re using primary matching for all constraints. Especially if you’re not too constrained on some of the parameters, it’s better to go with secondary matching. (Due to technical reasons, you have to keep primary matching on your first constraint, though the others can be all be Secondary.) As mentioned in the documentation for the constraint file format, (http://www.rosettacommons.org/manuals/archive/rosetta3.3_user_guide/app_match_enzdes_cstfile.html) simply add the appropriate ALGORITHM_INFO block to each constraint block. Because it tests existing positions instead of building and hashing, you don’t have the -100/-130/-115 problem you’d have above. (Note, though, that secondary matching can slow things way down if there are too many hits from the first match.)
  
  Another possibility is that you’ve messed up your pos file. The most likely reason is that you’ve used PDB numbering (e.g. the structure starts numbering residues from 25, with missing numbers) where you should have used pose numbering (always starting at 1 and increasing sequentially).
  
  Output format option is case sensitive. Try -match:output_format PDB. Be prepared for a massive increase in the number of output structures when going from CloudPDB output to PDB output, though.
- February 17, 2012 at 12:53 am #6660
  Anonymous
  Dear Sir,
  
  After testing one of the mutants that showed a promising insilico result as per rosetta and molecular dynamics study, the construct was not to detect phosphate ion concentration in micromolar and milimolar range. This means that there is a wide gap in insilico prediction and realizing the same in wet-lab.
  
  Last time I did the modeling after doing blind docking with autodock. Then used that region for rosetta modeling.
  
  I want to know , how can I improve the design for increasing the binding affinity. I am thinking that I should look for other crystal structure that bind to phosphate and then take those coordinates for further modeling studies. This time I would use rosetta match application to look for potential sites for mutation.
  
  Apart from this , your valuable advice would be much more helpful …
- July 2, 2012 at 7:05 am #7367
  Anonymous
  Dear Sir,
  
  I came across a paper published by Prof.Baker’s group , “Computational redesign of a mononuclear zinc metalloenzyme for organophosphate hydrolysis” … I need to whether the transition state parameters that are used in the paper , could be used in my case or not ??
- December 8, 2011 at 7:31 am #6374
  Anonymous
  I have read the first paper . But in my protein I don’t have a helix and sheet containing loop. Is it possible to model the binding geometry of p-loop to some 2-3 loops which are adjacent to one another. What p-loop forms for binding to phosphate is a small cup-shaped structure, which I want to mimic with 2-3 adjacent loops in my protein. Can u suggest how this could be done ??
  
  Also, I have one question regarding the anchor in anchorDesign. Does it have to be a single residue always ??
- December 8, 2011 at 4:26 pm #6377
  Anonymous
  For the anchor: nope, any number of contiguous residues will work.
  
  For your problem: well, I guess what you should try is enzyme design, treating the phosphate binding as the desired “transition state”, or possibly the hotspot hashing code that led to Fleishman’s influenza-binding protein. Both of those are complex procedures designed to search many scaffolds for a place to put a non-contiguous multi-residue site. I guess if you have only the one scaffold you can search over just it. They will try to fit the inserted residues to the pre-existing backbone, so maybe you need to diversify the loops in your protein (with loop modeling) before trying to fit your new site in? Does that make sense?
- December 9, 2011 at 1:19 am #6384
  Anonymous
  I have another query regarding AnchorDesign. Since it can work for contiguous residues, could it be used to anchor a small loop consisting of some 5 residues (which occurs bet. a helix and sheet) onto a bigger loop present on the protein surface connecting to beta-strands located opposite to each other??.. does that make any sense ??
- December 9, 2011 at 4:28 pm #6391
  Anonymous
  AnchoredDesign will let you clip a 5-residue loop out of one protein and insert it into a separate protein. It does need a few free loop residues on each side of the insert to ensure the loop will close. It will leave the core secondary structure elements in the new protein unmodified.
  
  You seem concerned about the fact that the loop used to be alpha-to-beta and now it will be beta-to-beta. The protocol itself is ignorant of that sort of distinction, it just puts things into loops. Hopefully the loop remodeling steps will ensure that the predicted conformation for the loop with the insert is physically plausible…
- December 10, 2011 at 1:52 am #6394
  Anonymous
  Thanks for your reply, I will start with the modeling work and will get back to you soon.
  
  I have an another doubt regarding the suggestion that you gave me to utilize enzyme design protocol for designing phosphate binding site. Does it always necessary to provide the dihedral constraint for enzyme design. As I am not able to get the dihedral angles, only bond lengths and bond angles are available from the published data…Also you said that I need to diversify the loop in order to using the hashing code method . Does that mean large loop sampling with different sequences that favours binding to phosphate??
- December 10, 2011 at 2:41 am #6395
  Anonymous
  My understanding of how enzyme design works is that it searches through a set of scaffolds for pre-existing sites onto which your desired set of residues can be grafted. I don’t think it samples possible backbone orientations of the scaffolds, it just takes them as-is. In your case, you want to graft into a certain protein, so you have only one scaffold. If there’s only one scaffold, then you probably need to do something to make more possible sites available, or it will just tell you there’s no good grafting sites at all. I am proposing that loop sampling (with or without sequence changes) may give you enough backbone diversity that matches for your needed site will show up somewhere in the ensemble.
  
  I sent this along to one of the enzyme guys for further comment.
- December 11, 2011 at 11:33 am #6397
  Anonymous
  I will wait for their response … I would like to know that for the anchordesign is it necessary to have a protein-protein complex ? Can we use protein and ligand complex (such as peptide ligand)
- December 12, 2011 at 4:20 am #6398
  Anonymous
  AnchoredDesign will work with peptides instead of proteins. It is untested with small molecule ligands, but replacing the target protein with a peptide will work.
- December 13, 2011 at 7:20 am #6402
  Anonymous
  Hi, did u get any reply from enzyme design group about the usage of only bond angles and lengths as constraints.. Also I was thinking to model for HIV nef-SH3 domain interaction in my protein. Here SH3 RT loop interacts with the two alpha-helical regions of HIV nef. THis kind of system can be designed using AnchorDesign or not ??
- December 13, 2011 at 11:02 pm #6407
  Anonymous
  The enzyme_design application itself (http://www.rosettacommons.org/manuals/archive/rosetta3.3_user_guide/app_enzyme_design.html) can work perfectly well with only bond lengths and angles as constraints, but I think your referring more to using the match application. (http://www.rosettacommons.org/manuals/archive/rosetta3.3_user_guide/app_match.html)
  
  While the match application can take constraints which don’t have the full six parameters specified (this is known as “secondary matching” in Rosetta parlance – see http://www.rosettacommons.org/manuals/archive/rosetta3.3_user_guide/app_match_enzdes_cstfile.html), the matcher requires the full six parameters on the *first* constraint block to initially place the ligand. There are tricks you can use, though, if you don’t want to constrain all six parameters.
  
  For example you can adjust the periodicity or the tolerance and number of samples to do a more complete sampling:
  
  CONSTRAINT:: torsion_AB: 0.00 5.00 25.00 10.00 0
  
  Will sample the AB torsion around the circle every ten degrees. You could also go finer (e.g. 5 or even 1 degree), with the understanding that finer sampling will be much slower and likely use more memory.
  
  Or you can do something like
  
  CONSTRAINT:: torsion_B: 180.00 60.00 25.00 360.00 6
  
  Which will make a sample every 10 degrees (2*6+1 = 13 total) in the range of -120 to +120 (180 +/- 60).
- December 14, 2011 at 7:41 am #6408
  Anonymous
  Thanks for your reply. Actually I am working with a single scaffold only. So, I don’t need the Match application. Since I was having only bond length and angles, I was thinking whether enzyme design would work with this or not. I have prepared the constraint file , after looking at the interaction in the crystal structure. I have attached both the files (constraint and crystal data file). Can you pls tell whether the format is correct or not. I was not able to understand how to define the three atoms in Phosphate ligand. In addition to this I want to ask another thing about the ligand and generating its conformers . Since Phosphate ion doesnot have any rotatable bond, can we use as it is without any conformers ??
- December 14, 2011 at 7:18 pm #6416
  Anonymous
  The match application is useful not just for selecting a scaffold from a set of scaffolds, but also selecting which positions on a particular scaffold to use, or where to position the ligand. For example, if you know you want an arginine binding to the ligand like *this* and some threonine binding to the ligand like *that*, but you don’t know where on the backbone the threonine is coming off, or where the ligand is positioned, you can use the match application to place the threonine on the backbone, and to place the ligand and arginine sidechains in the correct orientation so that all the constraints are satisfied. The enzyme_design application assumes that all of the constrained amino acids are in their final sequence position, and that the constrained sidechains and ligands are roughly in the correct orientations.
  
  Regarding the constraint files, the atom names and atom types used have to match up with the ones specified in the params files, either the ligand one specified with -extra_res_fa, or the standard amino acid ones in rosetta_database/chemical/residue_type_sets/fa_standard/residue_types/l-caa/ Pay particular attention to the ATOM lines – the first entry is the atom name, and the second is the atom type. So for specifying the three atoms in the phosphate ligand, you have to look at what they are called in the params file for the ligand you are using.
  
  It looks like the figure you include uses a slightly different hydrogen naming convention than Rosetta. Assuming that the hydrogens are present in the input PDB, and they are named like in the figure, Rosetta will just strip them out with an “unrecognized atom” warning, and then rebuild the hydrogens from ideal geometries using its naming convention. There is no HH1 atom on arginine – it would be either 1HH1 or 2HH1. (As an aside, because of the rebuilding issue and the fact that they’re not typically present in crystal structures, we typically don’t use hydrogens in constraints, instead basing hydrogen bond geometries off of the heavy atom positions – but if you have geometries specified with the hydrogens, it should be okay to use them.)
  
  Also pay attention to whether you’re using an atom type or an atom name specification. For the amino acids you use “atom_type”, but then specify an atom name. Especially for something like arginine protons, it might be useful to keep the atom_type, but change the atom name HH1 to the actual type of Hpol. This would specify an ambiguous constraint, and Rosetta would automatically pick the atom of the given type which best satisfied the constraint, regardless of what its name was. You have to be a little careful with atom_type specifications, as the angle and dihedral atoms are based off of the Rosetta’s internal atom tree, but for hydrogens the angle atom should always be the bound atom, so you should be safe.
  
  Also pay attention to which side is A and which is B. For your first block, angle A would be the P–O–H angle, and angle B would be the O–H–N angle.
  
  If your ligand doesn’t have any rotatable bonds, you only have one conformer, and you don’t need an additional file (e.g. PDB_ROTAMERS) to specify them. The default conformer/rotamer specified in the params file is suffient.
- December 15, 2011 at 4:44 am #6417
  Anonymous
  Ok I will try with match application . Also, if I mutate some residues of my protein to the ones that are present in the constraint file and then apply enzyme design. Does that make sense??
  
  About the naming convention, while writing the atom_type do we have to write the atoms in the correct order . Then for first block in the param file what atom I have to write as P, O constitutes only two atoms and I need to specify three atoms. I cannot include H because that comes from the arginine. Please clear this confusion also
- December 15, 2011 at 7:57 pm #6421
  Anonymous
  The match application would work well for placing sidechains and ligands – if you already have them placed, you can skip it. (In that case you will need to correctly specify the constraint REMARK line at the top of the PDB file, though. See http://www.rosettacommons.org/manuals/archive/rosetta3.3_user_guide/app_enzyme_design.html section “setup”.)
  
  The order of the atoms is important, as the order determines which atoms are used for the distance, angle and dihedral. If you’re not specifying the related dihedral constraint, then which atom you use for the dihedral doesn’t matter (although you still need to specify an atom as a placeholder). You’re correct that since the hydrogen from the arginine, you wouldn’t specify it as part of the ligand. For example, if your phosphate residue has atoms with Rosetta atom names of P1, O1, O2, O3, and O4, you could do something like
  
  TEMPLATE:: ATOM_MAP: 1 atom_name: O1 P O2
  TEMPLATE:: ATOM_MAP: 1 residue3: LIG
  
  With the appropriate definition on the amino acid side, this would mean that your distanceAB would be measured from H–O1, the angle_A from H–O1–P, and the torsion_A, were you to specify it, would be measured H–O1–P–O2
  
  If you’re specifying atom types (which may be useful if you don’t know which of the four oxygens will be participating in the interaction), you would only specify the one atom type (as opposed to the three atom names, if you are using atom_name). For example (assuming the phosphate oxygens are typed as OOC in the ligand params file):
  
  TEMPLATE:: ATOM_MAP: 1 atom_type: OOC
  TEMPLATE:: ATOM_MAP: 1 residue3: LIG
  
  The atom type specifies only the distance atom (whichever single atom best satisfies the constraints). The angle and the dihedral atom are picked based on the internal Rosetta atom tree. (This information can be gleaned from the ICOOR lines in the respective params file.) In the case of the phosphate, the angle atom for the oxygens is almost certainly the phosphorus (the atom tree is typically built through bonds). If you were using the dihedral constraint, deciding which of the other oxygens was the torsion atom might be a little tricky, but correctly specifying the periodicity of the constraint would likely simplify things. (All of this should be covered in the manual page on the constraint file format: http://www.rosettacommons.org/manuals/archive/rosetta3.3_user_guide/app_match_enzdes_cstfile.html)
- December 16, 2011 at 8:30 am #6425
  Anonymous
  Thanks for your help. So I have prepared the .cst file as per our guidance. After executing the enzdes protocol , I am getting the following error :
  
  Error: residue ARG131found in pdb header is not allowed by data in cstfile.
  ERROR:: Exit from: src/protocols/toolbox/match_enzdes_util/EnzCstTemplateRes.cc line: 272
  protocols.toolbox.match_enzdes_util.EnzConstraintIO: checking cst data consistency for block 1…
  protocols.toolbox.match_enzdes_util.EnzConstraintIO: WARNING: Message(s) above was printed in the end instead of proper place because this Tracer object has some contents left in inner buffer when destructor was called. Explicit call Tracer::flush() or end your IO with std::endl to disable this warning.
  
  It’s due to the pdb file that I am using which is not having hydrogens and Rosetta_atom types. I am using a docked complex obtained from autodock. How can I rectify this error. Pls find the attached pdb file and param file.
- December 16, 2011 at 6:27 pm #6427
  Anonymous
  The issue is that you have the same block number for all of the constraint REMARK lines in the header. The second to last column (13) declares which block in the cstfile the line is associated with. Since the arginine is in the fourth constraint block, it needs to be 4.
  
  Rosetta is able to rebuild hydrogens (and does so automatically), so having an input PDB without hydrogens is not a problem. The naming convention for protein heavy atoms is pretty uniform, so I doubt Rosetta would have a problem appropriately recognizing heavy atoms. If it did, it would print a warning message on protein loading, telling you how many atoms were discarded from each residue, and if there were any heavy atoms it expected to find but didn’t.
- December 18, 2011 at 10:49 am #6434
  Anonymous
  Thanks for your reply. Finally, I was able to run the design protocol successfully. I generated 1000 structures for my first design. Now the issue is about analysis. In the 2011 Con tutorial files , there’s a file describing important parameters for extracting the best scoring models. I was not able to understand some terms in that file, namely
  NaCo_dTotE value < 0.00
  NaCo_burunsat_pm value < 5.00
  NaCo_NLconts_pm value > -2.00
  NaCo_pstat_pm value > -0.05
  
  What is this Naco?? I have not got any parameter with this name in my scoring file. Also, all the structures have a positive total_score in my scoring file. So, how shall I analyze the structures with positive energy ??. Please find the score file attached with this mail.
  
  I also tried the match application with param file which is attached with this message. I was not able to get a single hit. How can I increase the chances of getting hits. In addition to that how do we have to use output_format parameter in matcher as I used it in the following way:-
  
  -match:output_format Pdb
  
  and ended up with an error message. Please give your critical comments on the above stated queries
- December 19, 2011 at 12:04 am #6440
  Anonymous
  Thanks for your comments. I will try with secondary matching option. But you forgot to answer one question regarding the positive values of total_score for all the generated structures obtained after enzdes run. Since the energy value has to be negative, does it mean that I need to generate more no. of structures ??
  
  The best scoring scheme that I found was – total_score, ligand binding energy?SR_interface_E_1_2, total constraint of catalytic residue(all_cst), packstat and buried unsat polar of ligand.
- December 19, 2011 at 12:11 am #6442
  Anonymous
  Rosetta scores are all relative. In a literal sense, a positive score means that Rosetta thinks all your structures will unfold instead of staying stable. But, more meaningfully, it means that there is something wrong with all of them, probably an unrelaxed clash in your input structures. If there’s a bad clash in part of the protein you aren’t remodeling, then it will just keep clashing in your final structure, and Rosetta will penalize the whole structure accordingly. You can still sort the structures by energy and use the lowest one.
  
  If you think the input structures are all of good quality, with scores of approximately -2 per residue on average, then positive-score outputs are an indicator that something is very wrong. Otherwise, it means the input has errors that are probably constant (and irrelevant) in your output.
- December 19, 2011 at 1:15 am #6443
  Anonymous
  The structure that I used was obtained from pdb, it’s not a modeled structure. The docking was done using autodock.
  As per your comment , do I need to minimize the structure before docking and redock it again and then use it for enzdes.
  
  Here are some of the lowest and highest values of some parameters
  Lowest , Highest
  total_score 525.37 , 658.63
  tot_burunstat_pm 69 , 92 (what should be an ideal cutoff for this value)
  all_cst 42.05 , 322.31 (what should be an ideal cutoff for this value)
  
  rest other parameters have values that are acceptable and could be used for analysis.
- December 19, 2011 at 4:16 am #6444
  Anonymous
  You only need to fix the bad scores if they’re near something you care about. If it’s clashes on the far side of the protein, ignore it. If it affects your site of interest, get it fixed. Does your output have per-residue scores so you can find the highly positive ones? Also, constraint scores are almost always positive, for most constraints they are zero at best. Constraint failures may be pushing you into positive territory.
- December 19, 2011 at 6:06 am #6445
  Anonymous
  The generated pdb files have the output for per-residue scores. I have attached the file having the lowest total_score amongst the 1000 generated structures. The all_cst value which is all constraint energy is positive in all cases which score more than 10.
  
  In addition to this, I want to tell that earlier we carried out the experiment of Fixed BB design with the same protein (GFP) that I am using now. Out the 10,000 generated structures the lowest and highest score was +1263 and +1328.
- December 20, 2011 at 10:17 pm #6455
  Anonymous
  Regarding your positive total score, you’re getting high fa_rep, for example with GLY_67, TYR_74, HIS_199 and others. This is not unexpected in direct-from PDB structures. Sometimes even a small movement (less than the error in crystallography) can greatly change the Rosetta energy. If it’s away from the region of interest, don’t worry about it, or you can pre-relax/minimize your input structure (http://www.rosettacommons.org/manuals/archive/rosetta3.3_user_guide/preparing_structures.html)
  
  Regarding constraint values, they really should be zero, although very low ( < 5 ) values are seen frequently. Enzdes uses flat-bottom constraints where the value is zero until you go outside of the range - if you're getting high values for constraints, either the values you specified for constraints are off (wrong center value or too narrow of range), or they are being significantly violated. In your example structure, the atom_pair_constraint is the major violated value. (The ligand is participating in most/all of the constraints, so look at its value: 28.2196) Most of this is to arginine 131, which is not surprising, as it’s sticking straight out, away from the phosphate. I’m not sure why this would be the case. Were the constraints violated for the input structure? (Try running the protocol with -enz_score and without any of the design/repacking flags.) The enzdes protocol may have a hard time locating a good constraint energy conformation if it doesn’t start with something close. You can also try doing some test runs with a more minimal constraint file. (e.g. Perhaps there’s something screwy with the arginine constraint that’s messing everything else up, so you can try removing it and see if the runs becomes more sensible.) Another possibility is that some energy term just doesn’t like the constrained conformation, resulting in Rosetta trading off unfavorable constraint energy for a more favorable other energy. (Possibly fa_sol?) You can try doing short test runs with the various energy terms set to zero in the weights file, but I’d hold off on that, as it’s more likely to be an issue with your input file or with the constraint specification.
- December 23, 2011 at 7:10 am #6467
  Anonymous
  Thanks for your reply, I tried the design again with following modification :-
  
  1. I used the relax structure after minimizing it using Rosetta. Now I am getting the total_score value as negative.
  2. I mutated Arginine back to Lysine which is the native residue and ran the design for generating 1000 str.
  3. I used the following criteria for filtering the best scoring models. :-
  
  req total_score value < 0.00
  req nlr_totrms value < 0.5
  req nlr_SR1_rms value < 0.5
  req nlr_SR2_rms value < 0.5
  req nlr_SR3_rms value < 0.5
  req SR_1_all_cst value < 1.2
  req SR_2_all_cst value < 1.0
  req SR_3_all_cst value < 2.3
  req all_cst value < 6.5
  output sortmin all_cst
  
  Finally I ended up with 15 models. But I didn’t get any best scoring models when I included the following score that were used in the tutorial :-
  
  req SR_4_interf_E_1_2 value < -8.5 . The smallest value in my case is -4.5 . So, what should be the ideal cutoff for this parameter,
  as it’s ligand binding energy which I think needs to be considered for filtering models.
  req tot_burunsat_pm value < 5.00 . Here the smallest value in my case is 57 . What should be the cutoff for this parameter??
  req tot_NLconts_pm value > -2.00 . Here the smallest value is 54 … ??
  req tot_pstat_pm value > -0.05 . Here the smallest value is 0.6
  
  Sampling with 1000 models is enough or I need to go with higher number of models ??
- December 23, 2011 at 8:29 pm #6469
  Anonymous
  req SR_4_interf_E_1_2 value < -8.5
  
  First off, the name of this column/score changes based on the number of constraints. For the tutorial it’s SR_4 (three constrained sidechains + ligand), whereas in the example scorefile you gave above it’s SR_5 (four constrained sidechains + ligand). If you change the constraint files, the SR’s will switch around based on what constraints you have. (See the enzdes documentation for full details.)
  
  Secondly, this value is the interface energy of your ligand, so it should roughly be analogous to the binding energy. As different compounds binding to different ligands have different binding energies, there isn’t a single cutoff value that can be universally recommended. Typically I base the cutoff on the range seen in the designs, depending on how stringent I’m being (e.g. set it to throw out roughly 75% of the structures). Another option is to dock the ligand against random (non-binding) scaffolds, and then use the interface energy of those complexes (rescored with the enzdes scorefunction) as a gauge of what a typical “nonbinding” value for your particular ligand would be. (E.g. set the cutoff at two standard deviations below the mean nonbinding value to make sure that all of your filtered designs are well-bound.)
  
  That’s the general case for all of these parameters. Unless you have some experimental intuition to help guide them, there really isn’t a single cutoff value which can be universally applied. It’s more a case of setting the cutoffs based on the ranges seen in the designs or associated test runs. And keep in mind the filters work best more for throwing out the bad ones, rather than passing the god ones.
  
  The number of models you need in an enzyme design experiment depends on how broadly you’re sampling, and how many different models you’ll be experimentally testing. If you have hundreds of different scaffolds and plan on only testing a few dozen, than 100 models per scaffold will likely be more than enough. If you only have one scaffold, you might have time to sample deeper, but at a certain point you’ll reach sampling saturation.
  
  Unfortunately, at this point a lot of design work is still based on experimenter’s intuition. Rosetta and Rosetta scoring is far from perfect, and can only get you so far. People still look at the resultant structures and often make manual tweaks when they see something which they think Rosetta gets terribly wrong. So look at your top models, and keep sampling until you get the X variants you want to test, or until it looks like Rosetta has reached a limit of producing new and interesting models. (This point will vary based on the properties your template and how big your design region is.)
- December 24, 2011 at 12:28 am #6472
  Anonymous
  Thanks for your detailed reply. I understood your point. I will run design protocol again for more number of models . I will also check for another site on the same protein and look for the differences between the two.
  
  I want to ask one more thing , that apart from the constraints that I mentioned , Rosetta has mutated some residues nearby those residues. Is it for better packing of the structure ??
- December 27, 2011 at 5:06 pm #6475
  Anonymous
  In Rosetta, de novo enzyme design is split into two phases, each accomplished by different programs. The initial match phase (performed by the match application) is to place the ligand/transition state as well as the key catalytic/interacting residues into an appropriate scaffold. This typically only places a few sidechains in the protein. The second phase (performed by the enzyme_design application) is intended to optimize the rest of the amino acids around the ligand/transition state. These are optimized against the entire Rosetta energy function, and so can mutate and repack based on anything which would improve the Rosetta energy – less steric clashes, better packing, better hydrogen bonding, better rotamers, etc. The particular reason in your case could be any of them, or even a combination.
  
  (BTW, you can control which residues you allow to mutate and repack using the cut1/cut2/cut3/cut4 interface autodetection flags, or by specifying specific residues with a resfile.)
- January 3, 2012 at 12:41 am #6485
  Anonymous
  Hi,
  
  I generated some 3000 models for my phosphate binding design experiment. I selected some best 3 on the basis of the scores mentioned in documentation. After doing a simulation of 10ns for all three of them, I found only one model to binding phosphate through out the simulation. Now, I want to know that is there any other method in Rosetta suite to make the predictions more stronger.
- January 3, 2012 at 4:30 pm #6487
  Anonymous
  The match-then-enzyme_design proceedure is the standard way of doing design of ligand binding proteins. If you’re looking for more, you’ll have to start bringing additional chemical intuition or post-analysis of initial designs into things. From the analysis you’ve done on the designs you’ve chosen, what do you think is lacking that’s making them poor binders? You need to come up with potential answers to that question before you can realistically make the predictions better.
  
  The first step is to reevaluate how you did your selection. You may need to up-weight some of the criteria and down-weight (or remove) others, based on how Rosetta is treating your system. For example, perhaps they suffer from poor hydrogen bonding? Increase the stringency of the cutoff for the appropriate hydrogen bonding metrics. Note that in some cases you may be looking to add in a metric which isn’t run by default but which may be available elsewhere in Rosetta (e.g. rotamer probabilities as in Fleishman et al. 2011 Protein Sci. 20(4):753-7).
  
  After that you can try redoing the designs with slightly different approaches. For example, maybe Rosetta isn’t finding enough hydrogen bonding interactions – you can try to force additional such interactions by matching them in (rather than hoping they’ll be designed in). Note that this may require some backbone movement/redesign if the backbone isn’t in a favorable location. (People are working on validated protocols for such situations, but they haven’t been published yet.) Or perhaps the binding pocket is too open and you want to close it off with loop remodeling. Or perhaps there’s some effect X (electrostatics, packing, selecting plausible rotamers) which you don’t think Rosetta is accounting for enough, you can possibly tweak the design process to up-weight it.
  
  Keep in mind that Rosetta design isn’t perfect. A one-in-three (wet-lab) experimental success rate is on the good side of typical for enzyme and ligand binding design. If you can increase the number of designs you test, you can find more potential hits. (e.g. you could potentially start with a larger number of shorter length simulations, and then toss those designs which show failure early.)
- February 17, 2012 at 3:40 am #6661
  Anonymous
  This means that there is a wide gap in insilico prediction and realizing the same in wet-lab.
  
  Welcome to the state of the art for in silico prediction. Unfortunately there’s currently still a large gap between predicting something computationally and having it work in the lab. For the large part it’s a numbers game. What you’re really doing with computational design is moving your chances of getting a successful result from less than one in a million to one in a thousand or one in a hundred or maybe one in a dozen, if you’re lucky.
  
  The way to work with this is to try multiple things. You picked a particular criteria to evaluate the top designs. Try using a different one, or weighting different factors more heavily. Take a look at the designs you made, and try to figure out what they’re missing. Perhaps you want more hydrogen bonding. Or better electrostatic complementarity. Or more rigidity in the binding site. Or perhaps it’s something else.
  
  Sometimes it’s not a case of being completely wrong, but just being slightly sub-optimal. It’s possible that a design is only a couple of mutations away from being a decent result. If you have a design you really like, you can try exploring similar structures to see if a closely related structure would be better. For example, try looking at the other computational designs you generated. Or try redesigning in homolog structures (sometimes a small backbone/context change means the difference between pulling out a good sequence and not). There’s also various tools you can use to do mutational scans of a design to try to optimize it. FilterScan with RosettaScripts is one possibility (http://www.rosettacommons.org/manuals/archive/rosetta3.3_user_guide/Filters_%28RosettaScripts%29#FilterScan) but there are several protocols out there.
  
  I could also suggest directed evolution and/or site saturation/scanning mutagenesis, but that tends to be more useful if you have a trace of activity. If you have no detectable activity it will unlikely to be helpful.
  
  You’ll likely need to test multiple designs before you get decent results. The standard in the Baker lab is to start with testing 10-12 designs all at once. If we’re lucky, we’ll get one or two active designs in that batch.
  
  By the way, I’m assuming that you got soluble protein. Occasionally you run into problems where your designs just won’t express solubily. In that case you can either change the scaffold (thermophile proteins are a good bet) or try to engineer in solubility (there’s a number of techniques for stabilizing protein – Dan Tawfik’s papers are a good starting point for a literature search).
- July 2, 2012 at 9:01 pm #7373
  Anonymous
  I’m not quite understanding what you’re looking for. The Supplementary material should be freely available from the Nature website, and points to a secondary site with most of the scripts and input files ( http://faculty.washington.edu/khares/Khare_NatChemBiol2012/ ).
  
  Whether using the parameters as-is would work in your case depends on exactly what your system is and what end result you’re trying to achieve.
Author

Posts

Viewing 4 reply threads

You must be logged in to reply to this topic.