ddg_monomer mutations list: How to specify chain ID?

Member Site Forums Rosetta 3 Rosetta 3 – Applications ddg_monomer mutations list: How to specify chain ID?

  • This topic has 13 replies, 6 voices, and was last updated 8 years ago by Anonymous.
Viewing 3 reply threads
  • Author
    Posts
    • #2097
      Anonymous

        I want to use ddg_monomer with multiple simultaneous mutations. According to the documentation (https://www.rosettacommons.org/docs/latest/ddg-monomer.html) this requires a mutation file, instead of the resfile used by other applications.

        However, in the example provided by the documentation, there is no use of a chain ID. My structure has multiple chains.

        How can I specify a chain ID in the mutation file?

        If this is not possible, how do I know what residue number I have to put in the mutation file, when all I know is the chain ID and the residue number inside the chain?

      • #10684
        Anonymous

          I believe that application recommends that the PDB file be renumbered, from residue 1 to n, and that it is this very issue with how it parses the mutation file that motivates this requirement. It’s not prohibitively difficult to write a Python script that renumbers PDB files in this way; it is similarly possible to use the opposite mapping to re-process the output.

          Looking at the documentation:

          All PDBs should be renumbered so that their first residue is residue 1 and number consecutively so that, if there are missing residues in the structure (due maybe to missing density) that these residues are simply skipped in the residue numbering. The numbering of all residues in both the distance-restraint file and the mutation-list file should follow this numbering.

        • #10692
          Anonymous

            Hi cossio, everyday847 and everyone,
            Can I ask can you successfully run the ddg_monomer? I tried to test it by only mutating the first residue into alanine. I already re-number the residues from 1 to the end. However, I got a log file (attached) with error message of

            Number of residue types is greater than MAX_RESIDUE_TYPES.
            (see attachment for details)

            In addition, it takes more than 10 hours to run. So I need to wait a long time before actually knowing it is going wrong.

            The other output files are “wt_.out” with 5738 KB and “wt_traj” with nothing in it.

            The input options file is also attached.

            I would appreciate it very much if someone can help me identify the problem. Please tell me if you need anything else.

            Thank you very much.

            Yours sincerely
            Cheng

          • #11512
            Anonymous

              Hi,

              I need to run ddG_monomer protocol on a multimeric protein where some chains are >20 Angstroms apart. From what I understand from the above discussion I need to renumber the PDB file so that all numbering is continuous. But, there is a minimization step involved in ddG_monomer protocol.

              1) If I use a structure where say residue 100 and 101 are 20 Angstroms apart, won’t that affect the minimization process and generate highly unstable energy values?

              2) Is there a way of combining rosetta fold and dock (or some other protocol) with ddG_monomer to run calculations on multimeric proteins?

              Thanks in advance.

            • #10699
              Anonymous

                You might want to start your own thread…

              • #10707
                Anonymous

                  Yes, the issue is the distinction between “PDB numbering” and “pose numbering”. In PDB numbering the numbers can start at an arbitrary point and jump around, and there’s a chain letter associated with it. With pose numbering, the first residue is residue 1, and the residue numbers increment by one each time, regardless of chain. So if you have 123 residues in the first chain, the first residue of the second chain will be 124.

                  Generally, if a Rosetta input or output doesn’t specify a chain letter, it’s expecting things to be in pose numbering. The difficulty of converting back and forth between the two numbering systems is why many protocols suggest renumbering the PDB such that pose numbering and PDB numbering match – often that’s not a hard requirement, so long as you can keep track of the conversion yourself.

                  Rosetta can renumber the PDBs on output for you if you provide the -out:file:renumber_pdb option. Alternatively, there’s a number of scripts in the tools/ directory which can renumber PDBs for you. I’ve used tools/protein_tools/scripts/pdb_renumber.py successfully, but it may require you to add the tools/protein_tools/ directory to your PYTHONPATH environment variable.

                • #10711
                  Anonymous

                    I will try “-override_rsd_type_limit” as rmoretti said in https://www.rosettacommons.org/node/3828
                    Sorry about that.

                  • #10712
                    Anonymous

                      Hi R Moretti,
                      Can I ask

                      1. So do you mean we cannot use a resfile in ddg_monomer?

                      I know only “-ddg::mut_file ” is used as an example in https://www.rosettacommons.org/docs/latest/ddg-monomer.html.

                      However, on that page, both resfile and mutfile (i.e. Input File – b-1) have been introduced. So why is that?

                      2. If we really can only use mutfile, what is the syntax if we want different mutations (NOT multiple mutations in one structure)?

                      I have tried

                      total 1
                      1
                      D 1 A
                      total 1
                      1
                      D 1 V
                      total 1
                      1
                      D 1 L
                      total 1
                      1
                      D 1 I
                      total 1
                      1
                      D 1 G

                      However, the run was not successful and I was told

                      “apps.public.ddg.ddg_monomer: end reading mutations for this” for the majority of ddg.log (491 MB), and

                      “/cm/local/apps/sge/current/default/spool/node-001/job_scripts/4679102: line 20: 26401 Bus error ddg_monomer.linuxgccrelease @ /home/ucbechz/Scratch/20141215_ddg_renumber_multi_test/input/options_3 > ddg.log” in the error file.

                      So how to deal with different mutations in one go?

                      Thank you very much.

                      Yours sincerely
                      Cheng

                    • #10722
                      Anonymous

                        From the ddg_monomer documentation page (https://www.rosettacommons.org/docs/latest/ddg-monomer.html):

                        All PDBs should be renumbered so that their first residue is residue 1 and number consecutively so that, if there are missing residues in the structure (due maybe to missing density) that these residues are simply skipped in the residue numbering.

                        Forget about multiple chains for a moment. Skipping missing residues means that I should renumber as 1, 2, 4, 5 (where residue 3 is missing), or
                        as 1, 2, 3, 4 …, where 3 now refers to the original residue 4?

                      • #10732
                        Anonymous

                          You can use a resfile (and thus PDB numbering) with ddg_monomer, but if you do you’re limited to specifying a set of single point mutations. If you want to have a single structure with multiple mutations, then you need to use the ddg_monomer-specific mutation file format, which means that you need to use pose numbering.

                          For the format of the mutations file, the “total” line only comes once, at the very top of the file.

                        • #10733
                          Anonymous

                            It’s the latter. The numbers should match the residues which are present. If the residue “3” is missing, then the number 3 should go to the third residue which *is* there, the original residue “4”.

                          • #10739
                            Anonymous

                              Hi R Moretti,
                              Thank you very much.

                              For mutfile, do you mean I should use the following if I have multiple single point mutations in one structure?

                              total 1
                              1
                              D 1 A
                              1
                              D 1 V
                              1
                              D 1 L
                              1
                              D 1 I
                              1
                              D 1 G

                              Thank you.

                              Yours sincerely
                              Cheng

                            • #11209
                              Anonymous

                                Cheng,

                                 

                                The “Total #” at the top of the mutfile indicates the total number of mutations to be made; so for your use here it would read:

                                total 5

                                1

                                D 1 A

                                1

                                D 1 V

                                1

                                D 1 L

                                1

                                D 1 I

                                1

                                D 1 G

                                The “1” in each line indicates the number of mutations to be made that round, so if you wanted to make two mutations simultaneously it would read

                                 

                                Total 6

                                2

                                D 1 A

                                G 3 A

                                2

                                D 1 V

                                G 3 V

                                2

                                D 1 L

                                G 3 L

                                 

                                Thanks,

                                M. Benhaim

                              • #11546
                                Anonymous

                                  In the standard (non-Cartesian, non-Dualspace) minimization proceedure – which should be what the ddg_monomer protocol is using – you’re minimizing over torisional angle space in the abscence of any energies which penalize bond lengths and angles. (Unlike molecular mechanics, the Rosetta energy function doesn’t have bond length/angle penalties – correct geometry is maintained by the sampling method, rather than the scoring function.) So even if Rosetta thinks residues 100 and 101 are chemically bonded, they’re not going to be drawn together or even cause an energy penalty.

                                  However, even if numbering is continous, that doesn’t mean that Rosetta thinks the residues are bonded. If you have different chain designations (e.g. 100 A and 101 B) or if you have a TER card between the chains in the input PDB, then Rosetta should make them independent chains. 

                                  You can certainly combine fold and dock (ab initio folding of symmetric proteins) and other such structure prediction programs with ddg_monomer. However, the way you would do it would be to do a full fold and dock (or RosettaCM, or protein-protein docking, etc) run to get output structures. Then you’d take one or more of those output structures and use them as inputs to the ddg_monomer protocol. 

                                  Despite the name, “ddg_monmer” should work (that is, run and produce output) even on multimer structures. The “monomer” in the name is mostly refering to the case that its use case is for calculating the ddG of folding, rather than something like the ddG of binding. (The caveat here is that the benchmark was done on monomeric proteins, so the results for residues in the interface of a protein dimer may or may not perform as well as for residues in the core of a monomeric protein.)

                              Viewing 3 reply threads
                              • You must be logged in to reply to this topic.