Warning when running cluster program

Member Site Forums Rosetta 3 Rosetta 3 – General Warning when running cluster program

Viewing 4 reply threads
  • Author
    • #1310

        I have a protein of 106 residues (numbered from 1 to 106). I ran relax and cluster the resulting 200 poses.

        cluster.linuxgccrelease -in:file:silent mysilentfile

        But I got tons of such warning:

        core.scoring.rms_util: WARNING: In CA_rmsd, residue range 1 to 111 requested but only 106 protein CA atoms found.

        Why does cluster thinks I have 111 residues? Can I still trust the cluster result with all the warnings? Does this have anything to do with the “jump” concept?

      • #7260

          Do you have any ligands present?

        • #7261

            No, no ligands.
            The warnings disappeared after I set the native structure:

            cluster.linuxgccrelease -in:file:silent mysilentfile -in:file:fullatom -in:file:native mynative.pdb

          • #7323

              Also, it seems the energy calculated by cluster is different from those by score_jd2.
              I have docking decoys in silent files. I calculated energy for each decoy using score_jd2. Then I clustered the decoys using cluster. cluster outputs a list of scores like:

              protocols.cluster: Adding struc: -103.753

              But they are totally different from score_jd2 score values. Not the order of them, but the values. (Rosetta v3.4 is used)

            • #7345

                Yes, your are totally right. I noticed centroid atoms in the .pdb files after clustering. I specified -in:file:fullatom when doing clustering. This did not help, though.

                Missing residues complained by cluster program are the virtual residues used in FoldTree. Using -cluster:exclude_res to exclude them will make cluster program stop complaining. (Is it normal that cluster does not recognize the virtual residues generated by Rosetta itself?)

              • #7312

                  Does “mynative” have the same number of residues as the poses in the silent file? The error is that you have 111 CAs in one (c-alpha, not calcium), and 106 in the other.

                • #7317

                    Yes, both the native decoy and Rosetta output decoys have 106 CA atoms in them. But they are dimers (chain A&B). I have no idea where the number 111 is from. 111-106=5, it is not even an even number (since it is homodimer, I expect the difference to be multiples of 2).

                    I worked on another protein, it is a trimer. When I do clustering, it generates the same warning 67 instead of 54. I checked the input and output. Both have only 54 residues as well as CA atoms in it.

                  • #7325

                      I can’t find an answer for the # CA atoms mismatch.

                      If the score difference is small (a few units), it’s due to imprecision of structure storage on disk, especially if you are using PDBs. A PDB has three decimal places, but the rosetta scorefunction is sensitive to many more decimal places. This means a PDB will never rescore the same as the in-code pose from which the PDB was produced. The problem is ameliorated, if not eliminated, for binary-style silent files.

                    • #7326

                        Yes, I am aware of this precision problem from previous threads. But the difference is huge:

                        score_jd2 says:

                        SCORE: 436.487 -380.814 669.460 182.095 0.763 0.000 0.000 -66.162 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -2.372 2.081 24.410 3.725 3.300 0.000 0.000 -13.400 0.000 mystruc_0091_0001

                        while cluster says:
                        protocols.cluster: mystruc_0091 -51.781

                      • #7328

                          Can you give some hint on the concept of “jump residue”?

                        • #7331

                            You’re right, that’s not a precision thing. Could it be a fullatom/centroid thing? Is the previous code emitting centroid or fullatom PDBs?

                          • #7332

                              This paper

                              describes the “fold tree” in Rosetta. Basically, Rosetta uses internal coordinates wherever possible, and only converts to XYZ coordinates for some score functions and to print results. Atom positions are calculated by iterating along the network of internal coordinates. The AtomTree and FoldTree specify which atoms are connected to which other atoms, and by which degrees of freedom. This lets Rosetta know things like how to move the end of a lysine when the base chi angles rotate. The tree MUST be a directed acyclic graph that contains all atoms, so that all are accounted for and singly connected. Most connections in these Trees represent chemical bonds. Jumps come in where connections are needed but can’t be represented by chemistry. For example, if you have two chains, there is a Jump between the first and second chains representing how the second chain is positioned with respect to the first. Jumps do other things too, especially in loop modeling, but non-chemical internal-coordinate connections between independent molecules in the pose is the common case.

                            • #7339

                                Yes, your are totally right. I noticed centroid atoms in the .pdb files after clustering. I specified -in:file:fullatom when doing clustering. This did not help, though.

                              • #7369

                                  (Is it normal that cluster does not recognize the virtual residues generated by Rosetta itself?)

                                  Probably – this is exactly the sort of thing developers ignore, because they know what the warning means and know it’s irrelevant, so they ignore it. I’ve never seen it, but I’ve never used cluster…

                                • #7370

                                    I read the source code and played with my poses using PyRosetta. I realized that the cluster program uses CA_rmsd(), which checks whether the number of residues equals to the number of CA atoms in the protein. If not it spits the WARNING. Virtual residues do not have CA atoms.

                                    It seems CA_rmsd carries on with the calculation of RMSD using all CA atoms it gets. For different poses from the same protein, the WARNING can be safely ignored (unofficial judgement!).

                                Viewing 4 reply threads
                                • You must be logged in to reply to this topic.