Member Site Forums Rosetta++ Rosetta++ – General clustering_decoy

Viewing 14 reply threads
  • Author
    Posts
    • #563
      Grace Hopper
      Participant

        hi..

        how to identify the best decoy according to clustering tree.
        if my protein is 176 aa, how many decoy should i generate

      • #3953
        Sally Ride
        Participant

          Ad 1) I usually do as following:

          ^find /full/path/to/decoys/folder/ -maxdepth 1 -name ‘*.pdb’ -size +1 > list^
          ^/home/kosa/bin/compose_score_silent.py t228.out list^
          ^/home/kosa/bin/cluster_info_silent.out ../t228.out – t228 5,25,50,100 2,5^

          and eventually re-running the last step with different options.

          Ad 2) I think nobody knows the answer for this question ;-) You might try to cluster decoys after generating lets say 10 000 decoys, analyze the results and then if you do not see big clusters generate more decoys … The results depend also on the particular protein, some of them “fold more easily” with rosetta (ex. those with simple topologies).

          > hi..
          >
          > how to identify the best decoy according to clustering tree.
          > if my protein is 176 aa, how many decoy should i generate

        • #3959
          Grace Hopper
          Participant

            hi kosa,
            how to identify big cluster according to the tree figure
            do u have eg. for the big cluster

          • #3960
            Grace Hopper
            Participant

              can u please go to this blog:
              http://icbbio.blogspot.com/

              what is your comment about this clustering tree

              is there any good decoy to start protein modelling

            • #3961
              Sally Ride
              Participant

                Well, it does not look very good… Look at the rmsd values: they are pretty high (above 8A) and clusters are in the same time quite small. In the prefix.info file you should be able to find the exact rmsd cutoff which was used for the clustering. I would check the superposition of the decoys in five top clusters to see what 8A actually means for your protein. You should have the CA superpositions in files with names such as cluster00.015.pdb. If the decoys are virtually dissimilar you need to go down with the rmsd threshold (by manipulating cluster_info_silent.out options). But then of course you would get smaller clusters (and I would not be sure whether clusters with less then 10 members are meaningful at all…).

                Maybe indeed you need to generate more decoys (how many you got so far?) or run rosetta with different options. Maybe it is possible to split your protein into two domains?

                Frankly, I have never succeeded folding with rosetta a protein as long as yours. But certainly I do not use so big computational resources as some rosetta power users ;-)

                >
                >
                >
                > can u please go to this blog:
                > http://icbbio.blogspot.com/
                >
                > what is your comment about this clustering tree
                >
                >
                > is there any good decoy to start protein modelling

              • #3963
                Grace Hopper
                Participant

                  how do u know my decoy above 8A.
                  i am generate 9999 decoy
                  run cluster_info_silent.out with 5,15,45,75 3,4

                  what ur comment

                  > Well, it does not look very good… Look at the rmsd values: they are pretty high (above 8A) and clusters are in the same time quite small. In the prefix.info file you should be able to find the exact rmsd cutoff which was used for the clustering. I would check the superposition of the decoys in five top clusters to see what 8A actually means for your protein. You should have the CA superpositions in files with names such as cluster00.015.pdb. If the decoys are virtually dissimilar you need to go down with the rmsd threshold (by manipulating cluster_info_silent.out options). But then of course you would get smaller clusters (and I would not be sure whether clusters with less then 10 members are meaningful at all…).
                  >
                  > Maybe indeed you need to generate more decoys (how many you got so far?) or run rosetta with different options. Maybe it is possible to split your protein into two domains?
                  >
                  > Frankly, I have never succeeded folding with rosetta a protein as long as yours. But certainly I do not use so big computational resources as some rosetta power users ;-)
                  >
                  >
                  >
                  >
                  > >
                  > >
                  > >
                  > > can u please go to this blog:
                  > > http://icbbio.blogspot.com/
                  > >
                  > > what is your comment about this clustering tree
                  > >
                  > >
                  > > is there any good decoy to start protein modelling

                • #3964
                  Sally Ride
                  Participant

                    > how do u know my decoy above 8A.

                    On the X axis of your clustering tree you have got rmsd values… You should be able to find the exact rmsd cutoff used for clustering in the *.info file you should have got in the clustering directory. It should be a line like:

                    TARGET: 100 THRESHOLD: 1.271307

                    just after the AC lines.

                    > i am generate 9999 decoy

                    For such a long protein it might be much too few. Especially if you get poor clustering.

                    > run cluster_info_silent.out with 5,15,45,75 3,4

                    With such options the program did not find any cluster with the size above 15 when using rmsd threshold between 3 and 4 A. So it increased the rmsd threshold until it got the top cluster of size 15.

                    Please post the AC lines from your *.info file so I could advice you how you should modify the clustering options.

                    And if you have computational power generate another 20000 decoys ;-) You can try different options/protocols or try to fold a homolog.

                  • #3965
                    Grace Hopper
                    Participant

                      is this the AC lines?

                      COMMAND: ../../C/cluster_info_silent.out bbphad.out _ cluster1/tmp 5,15,45,75 3,4
                      AC: target 15 threshold 8.21 clusters 69 coverage 466
                      AC: target 16 threshold 8.21 clusters 69 coverage 466
                      AC: target 17 threshold 8.21 clusters 69 coverage 466
                      AC: target 18 threshold 8.37 clusters 88 coverage 616
                      AC: target 19 threshold 8.37 clusters 88 coverage 616
                      AC: target 20 threshold 8.37 clusters 88 coverage 616
                      AC: target 21 threshold 8.39 clusters 89 coverage 628
                      AC: target 22 threshold 8.40 clusters 91 coverage 639
                      AC: target 23 threshold 8.40 clusters 91 coverage 639
                      AC: target 24 threshold 8.40 clusters 91 coverage 639
                      AC: target 25 threshold 8.55 clusters 115 coverage 819
                      AC: target 27 threshold 8.55 clusters 115 coverage 819
                      AC: target 29 threshold 8.61 clusters 120 coverage 870
                      AC: target 31 threshold 8.64 clusters 126 coverage 916
                      AC: target 33 threshold 8.64 clusters 126 coverage 916
                      AC: target 35 threshold 8.65 clusters 128 coverage 931
                      AC: target 38 threshold 8.80 clusters 156 coverage 1192
                      AC: target 41 threshold 8.87 clusters 165 coverage 1295
                      AC: target 44 threshold 8.87 clusters 166 coverage 1305
                      AC: target 47 threshold 8.88 clusters 167 coverage 1311
                      AC: target 51 threshold 8.97 clusters 181 coverage 1453
                      AC: target 55 threshold 8.99 clusters 185 coverage 1493
                      AC: target 60 threshold 9.10 clusters 201 coverage 1700
                      AC: target 65 threshold 9.16 clusters 220 coverage 1867
                      AC: target 71 threshold 9.22 clusters 227 coverage 1994
                      TARGET: 15 THRESHOLD: 8.209858

                      what’s ur comment?

                      > > how do u know my decoy above 8A.
                      >
                      > On the X axis of your clustering tree you have got rmsd values… You should be able to find the exact rmsd cutoff used for clustering in the *.info file you should have got in the clustering directory. It should be a line like:
                      >
                      > TARGET: 100 THRESHOLD: 1.271307
                      >
                      > just after the AC lines.
                      >
                      > > i am generate 9999 decoy
                      >
                      > For such a long protein it might be much too few. Especially if you get poor clustering.
                      >
                      > > run cluster_info_silent.out with 5,15,45,75 3,4
                      >
                      > With such options the program did not find any cluster with the size above 15 when using rmsd threshold between 3 and 4 A. So it increased the rmsd threshold until it got the top cluster of size 15.
                      >
                      > Please post the AC lines from your *.info file so I could advice you how you should modify the clustering options.
                      >
                      > And if you have computational power generate another 20000 decoys ;-) You can try different options/protocols or try to fold a homolog.
                      >
                      >
                      >
                      >

                    • #3966
                      Grace Hopper
                      Participant

                        i am generated 3 x of 9999 decoy, did i need to merge them befor clustering analysis? if yes how?
                        > hi..
                        >
                        > how to identify the best decoy according to clustering tree.
                        > if my protein is 176 aa, how many decoy should i generate

                      • #3967
                        Sally Ride
                        Participant

                          Hi,

                          Yes, these are the AC lines.
                          So you can see that the program did clustering at 8.209858 rmsd threshold to get the top closter of size 15. AC line can show you what would happen with lower threshold but you need to specify lower bound for the minimum size of the top cluster (15 members at present).

                          Please to clustering with the following options:
                          5,5,15,20 3,4

                          however anyway you need to generate more decoys. And you should really open the files with names such as ” cluster00.015.pdb” in any molecular viewer which can read CA traces (pymol, ramol) and see whether structures from a single cluster have the same fold at the given rmsd threshold.

                          > is this the AC lines?
                          >
                          > COMMAND: ../../C/cluster_info_silent.out bbphad.out _ cluster1/tmp 5,15,45,75 3,4
                          > AC: target 15 threshold 8.21 clusters 69 coverage 466
                          > AC: target 16 threshold 8.21 clusters 69 coverage 466
                          > AC: target 17 threshold 8.21 clusters 69 coverage 466
                          > AC: target 18 threshold 8.37 clusters 88 coverage 616
                          > AC: target 19 threshold 8.37 clusters 88 coverage 616
                          > AC: target 20 threshold 8.37 clusters 88 coverage 616
                          > AC: target 21 threshold 8.39 clusters 89 coverage 628
                          > AC: target 22 threshold 8.40 clusters 91 coverage 639
                          > AC: target 23 threshold 8.40 clusters 91 coverage 639
                          > AC: target 24 threshold 8.40 clusters 91 coverage 639
                          > AC: target 25 threshold 8.55 clusters 115 coverage 819
                          > AC: target 27 threshold 8.55 clusters 115 coverage 819
                          > AC: target 29 threshold 8.61 clusters 120 coverage 870
                          > AC: target 31 threshold 8.64 clusters 126 coverage 916
                          > AC: target 33 threshold 8.64 clusters 126 coverage 916
                          > AC: target 35 threshold 8.65 clusters 128 coverage 931
                          > AC: target 38 threshold 8.80 clusters 156 coverage 1192
                          > AC: target 41 threshold 8.87 clusters 165 coverage 1295
                          > AC: target 44 threshold 8.87 clusters 166 coverage 1305
                          > AC: target 47 threshold 8.88 clusters 167 coverage 1311
                          > AC: target 51 threshold 8.97 clusters 181 coverage 1453
                          > AC: target 55 threshold 8.99 clusters 185 coverage 1493
                          > AC: target 60 threshold 9.10 clusters 201 coverage 1700
                          > AC: target 65 threshold 9.16 clusters 220 coverage 1867
                          > AC: target 71 threshold 9.22 clusters 227 coverage 1994
                          > TARGET: 15 THRESHOLD: 8.209858
                          >
                          > what’s ur comment?
                          >
                          >
                          > > > how do u know my decoy above 8A.
                          > >
                          > > On the X axis of your clustering tree you have got rmsd values… You should be able to find the exact rmsd cutoff used for clustering in the *.info file you should have got in the clustering directory. It should be a line like:
                          > >
                          > > TARGET: 100 THRESHOLD: 1.271307
                          > >
                          > > just after the AC lines.
                          > >
                          > > > i am generate 9999 decoy
                          > >
                          > > For such a long protein it might be much too few. Especially if you get poor clustering.
                          > >
                          > > > run cluster_info_silent.out with 5,15,45,75 3,4
                          > >
                          > > With such options the program did not find any cluster with the size above 15 when using rmsd threshold between 3 and 4 A. So it increased the rmsd threshold until it got the top cluster of size 15.
                          > >
                          > > Please post the AC lines from your *.info file so I could advice you how you should modify the clustering options.
                          > >
                          > > And if you have computational power generate another 20000 decoys ;-) You can try different options/protocols or try to fold a homolog.
                          > >
                          > >
                          > >
                          > >

                        • #3968
                          Sally Ride
                          Participant

                            Are you generating them in pdb or silent format?

                            > i am generated 3 x of 9999 decoy, did i need to merge them befor clustering analysis? if yes how?
                            > > hi..
                            > >
                            > > how to identify the best decoy according to clustering tree.
                            > > if my protein is 176 aa, how many decoy should i generate

                          • #3969
                            Grace Hopper
                            Participant

                              i am generates in silent format

                              > Are you generating them in pdb or silent format?

                              >
                              > > i am generated 3 x of 9999 decoy, did i need to merge them befor clustering analysis? if yes how?
                              > > > hi..
                              > > >
                              > > > how to identify the best decoy according to clustering tree.
                              > > > if my protein is 176 aa, how many decoy should i generate

                            • #3970
                              Grace Hopper
                              Participant

                                is it correct if i just using a domain which is conserved domain according to the domain prediction server. i just remove N and C terminal to make my protein smaller from 295 aa to 176 aa.
                                or any other suggestion

                                thanks

                                Hasni

                              • #3971
                                Sally Ride
                                Participant

                                  I think you can just concatenate the files but I am not sure. I always generate decoys in pdb format becuase i) later I have a script that extracts decoys in clusters from the original set of pdb decoys, ii) I run rosetta on local clusters so I do not have to care about the size of the output data iii) I often process decoys by other means.

                                  So: I do not have much experience with working with silent format. I will start separate post on this forum to ask how to work with that.

                                  > i am generates in silent format
                                  >
                                  >
                                  >
                                  >
                                  > > Are you generating them in pdb or silent format?
                                  >
                                  >
                                  >
                                  > >
                                  > > > i am generated 3 x of 9999 decoy, did i need to merge them befor clustering analysis? if yes how?
                                  > > > > hi..
                                  > > > >
                                  > > > > how to identify the best decoy according to clustering tree.
                                  > > > > if my protein is 176 aa, how many decoy should i generate

                                • #3972
                                  Sally Ride
                                  Participant

                                    Yes, it is correct if the conserved domain is independent folding unit (if the conserved domain can be found in different domain context in unrelated proteins it is very likely to be the case). However, if it closely interacts with other domains with big hydrophobic patch you might run into problems (rosetta might try to push the should-be-surface hydrophobic residues to the core).

                                    In your case I think the problems are due to the protein length and many many more decoys as well as more sophisticated rosetta approaches might be needed (with no guarantee of success).

                                    >
                                    > is it correct if i just using a domain which is conserved domain according to the domain prediction server. i just remove N and C terminal to make my protein smaller from 295 aa to 176 aa.
                                    > or any other suggestion
                                    >
                                    >
                                    > thanks
                                    >
                                    > Hasni

                                Viewing 14 reply threads
                                • You must be logged in to reply to this topic.