cluster pdb structures questions

Member Site Forums Rosetta 3 Rosetta 3 – General cluster pdb structures questions

Viewing 2 reply threads
  • Author
    Posts
    • #1545
      Anonymous

        Hi,

        I’m new to rosetta and trying to cluster 100 pdb structures using cluster and get a most common one. What should be the expected output files?
        I read the cluster command user guide that the script need first 400 structures as the starting point, so is it because my 100 structures is too less, which cause the problem?
        (https://www.rosettacommons.org/manuals/archive/rosetta3.4_user_guide/d7/d6f/cluster_commands.html)

        I tried it twice:
        A) my flag
        -database /Applications/rosetta3.4/rosetta_database/
        -in:file:l 5S-100pdb-list (where I put the name list of all the 100 pdb structures)

        my error

        ERROR: Illegal attempt to score with non-identical atom set between pose and etable
        ERROR:: Exit from: src/core/scoring/etable/EtableEnergy.cc line: 75

        B) my flag
        -database /Applications/rosetta3.4/rosetta_database/
        -in:file:l 5S-100pdb-list
        -score:weights /Applications/rosetta3.4/rosetta_database/scoring/weights/cen_std
        -score:patch /Applications/rosetta3.4/rosetta_database/scoring/weights/score12

        my output files are another 100 pdb files named c. XXX.0.pdb. But I expect it to be one or two structures that represent the most common ones.

        It will be great if you can give me some suggestions.

        Thanks!

      • #8574
        Anonymous

          Your error (and your flag set) are related to the fact that you aren’t clear whether you are using centroid or fullatom scorefunctions. Are your PDBs centroid or fullatom? Centroid means they have CB atoms only (and those atoms do not act like real CBs), fullatom means all sidechain atoms and hydrogens.

          This:

          -score:weights /Applications/rosetta3.4/rosetta_database/scoring/weights/cen_std
          -score:patch /Applications/rosetta3.4/rosetta_database/scoring/weights/score12

          can never ever be done – cen_std is the *centroid* standard scorefunction, score12 is the *fullatom* standard scorefunction.

          The solution to your problem is probably to use either -in:file:centroid or -in:file:fullatom (depending on your PDBs).

          I have heard good things about using calibur http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2881085/ instead of rosetta for clustering.

        • #8594
          Anonymous

            Thank you guys!
            I tried Calibur as you mentioned and it just works!
            It listed the largest cluster with no problem.

          • #8583
            Anonymous

              Hi,

              Thank you Smlewis for your reply. But seems I still get errors.

              My PDB has more than CB atoms (as attachment), so I choose the fullatom one.(Is there a third option?) But I got lots of warnings in the logfile (~15M!).
              And after the run, I still got another 100 PDB files generated named” C.XX.0.pdb”. Besides these PDBs, I don’t get the clustered structures (PDBs), which I suppose I can get after the run, right?

              My input flag: (Do I need to add anything else?)
              -database /Applications/rosetta3.4/rosetta_database/
              -in:file:l 5S-100pdb-list
              -in:file:fullatom
              -score:weights /Applications/rosetta3.4/rosetta_database/scoring/weights/score12
              -score:patch /Applications/rosetta3.4/rosetta_database/scoring/weights/score12

              My warnings are like this: (and it repeats for all 100 structures from S_00000001.pdb to S_00000100.pdb
              core.io.pdb.file_data: [ WARNING ] discarding 1 atoms at position 1 in file S_00000001.pdb. Best match rsd_type: MET_p:NtermProteinFull
              core.io.pdb.file_data: [ WARNING ] discarding 1 atoms at position 2 in file S_00000001.pdb. Best match rsd_type: GLY
              core.io.pdb.file_data: [ WARNING ] discarding 1 atoms at position 3 in file S_00000001.pdb. Best match rsd_type: PRO
              core.io.pdb.file_data: [ WARNING ] discarding 1 atoms at position 4 in file S_00000001.pdb. Best match rsd_type: LEU

              core.conformation.Conformation: [ WARNING ] missing heavyatom: CG on residue MET_p:NtermProteinFull 1
              core.conformation.Conformation: [ WARNING ] missing heavyatom: SD on residue MET_p:NtermProteinFull 1
              core.conformation.Conformation: [ WARNING ] missing heavyatom: CE on residue MET_p:NtermProteinFull 1
              core.conformation.Conformation: [ WARNING ] missing heavyatom: CG on residue PRO 3
              core.conformation.Conformation: [ WARNING ] missing heavyatom: CD on residue PRO 3

              core.io.pdb.file_data: [ WARNING ] can’t find atom for res 1 atom CEN (trying to set temp)
              core.io.pdb.file_data: [ WARNING ] can’t find atom for res 2 atom CEN (trying to set temp)

              I’d appreciate for any suggestions!

            • #8584
              Anonymous

                A few quick late night suggestions before Steven’s comments tomorrow morning:

                1) Use standard for weights and score12 for patch with fullatom structures. You don’t need to give the full path here. It will search in the rosetta_database. I usually give the extension as well, but perhaps you don’t need to.

                2) The CEN is for the centroid-based sidechain. Your PDB has backbone atoms and then CEN for the sidechain. Don’t pass in:file:full atom. Although, here, what you would really want to do is use the relax step after abinitio. It really helps. If you are using some other application, use the flag out:file:fullatom to output fullatom structures. There is a way to convert your centroid PDBs to fullatom outside of the protocol – I think it’s through the JD2 application but I’ve never used it. Steven?

                3) Use calibur as Steven suggested. It is easy to use, and will save you this headache.

              • #8587
                Anonymous

                  These are defintely centroid structures. I was unclear/incorrect/vague in my statement of “only CB atoms” – I meant that the only real atoms were CB, and all beyond-CB atoms were missing and replaced with centroid (CEN). My apologies. Don’t use -in:file:fullatom, and don’t use those weight sets, use cen_std as before.

                  If you want to upconvert your centroids to fullatom, do this:

                  fixbb -nstruct 1 -ndruns 10 -packing:repack_only -ex1 -ex2 -in:file:fullatom -s (your PDBs)

                  This will repack on real sidechains in place of the centroids; then your existing cluster command line should work. Don’t pass score12 as both a weights and a patch, do just the weights but no patch. (You don’t need to apply the patch if it is already score12 – the patch converts “standard”, which isn’t standard, to score12.)

              Viewing 2 reply threads
              • You must be logged in to reply to this topic.