kinematic loop modeling/sequence design keeps crashing with segmentation fault

Member Site Forums Rosetta 3 Rosetta 3 – General kinematic loop modeling/sequence design keeps crashing with segmentation fault

Viewing 7 reply threads
  • Author
    Posts
    • #1544
      Anonymous

        hi
        i am running a kinematic loop modeling/design run but it keeps on crashing with segementation fault error. could you tell me how to avaoid this? thanks.

        command line:
        %loopmodel.gccrelease -database XXX @loopmodelflags

        flagfile:
        -loops:input_pdb XXX.pdb
        -loops:loop_file XXX.loop
        -loops:remodel perturb_kic
        -loops:refine refine_kic
        -loops:relax fastrelax
        -loops:extended
        -in:file:fullatom
        -loops:max_kic_build_attempts 10000
        -out:file:fullatom
        -out:overwrite
        -out:prefix AAA
        -out:path ./
        -out:file:scorefile score.sc
        -ex1
        -ex2
        -nstruct 10000
        -resfile XXX.resfile
        -mute core.util.prof ## don’t show timing info
        -mute core.io.database ## don’t show database info

        loop file:
        LOOP 32 41 36 0 1
        LOOP 45 49 47 0 1

        output:
        ~~~~

        protocols.jobdist.JobDistributors: Looking for an available job: 17 1 S 17
        core.scoring.ScoreFunctionFactory: SCOREFUNCTION: standard
        core.scoring.ScoreFunctionFactory: SCOREFUNCTION PATCH: score12
        protocols.looprelax: ==== Loop protocol: =================================================
        protocols.looprelax: remodel perturb_kic
        protocols.looprelax: intermedrelax no
        protocols.looprelax: refine refine_kic
        protocols.looprelax: relax fastrelax
        protocols.looprelax: ====================================================================================
        protocols.looprelax: ===
        protocols.looprelax: === Remodel
        protocols.looprelax: ===
        protocol.loops.LoopMover: ALL_LOOPS:LOOP begin end cut skip_rate extended
        protocol.loops.LoopMover: LOOP 32 41 36 0 1
        protocol.loops.LoopMover: LOOP 45 49 47 0 1
        protocol.loops.LoopMover:
        protocol.loops.LoopMover: SELECTEDLOOPS:LOOP begin end cut skip_rate extended
        protocol.loops.LoopMover: LOOP 32 41 36 0 1
        protocol.loops.LoopMover: LOOP 45 49 47 0 1
        protocol.loops.LoopMover:
        protocols.loops.loops_main: Pose fold tree FOLD_TREE EDGE 1 30 -1 EDGE 30 36 -1 EDGE 30 43 1 EDGE 43 37 -1 EDGE 43 346 -1
        protocols.loops.loops_main:
        protocol.loops.LoopMover: Setting extended torsions: LOOP 32 41 36 0 1
        protocol.loops.LoopMover: Building Loop: LOOP 32 41 36 0 1
        protocol.loops.LoopMover: Building Loop attempt: 0
        protocol.loops.LoopMover: perturb_one_loop_with_KIC: 32 10
        protocol.loops.LoopMover: remodel init temp: 2

        protocol.loops.LoopMover: remodel final temp: 1
        protocol.loops.LoopMover: kinematic initial perturb with start_res: 32 middle res: 36 end_res: 41
        protocol.loops.LoopMover: loop rmsd before initial kinematic perturbation:0
        protocol.loops.LoopMover: Attempting loop building: 0 …
        protocol.loops.LoopMover: Attempting loop building: 1 …
        protocol.loops.LoopMover: Attempting loop building: 2 …
        protocol.loops.LoopMover: Attempting loop building: 3 …
        protocol.loops.LoopMover: Attempting loop building: 4 …
        protocol.loops.LoopMover: Attempting loop building: 5 …
        protocol.loops.LoopMover: Attempting loop building: 6 …
        protocol.loops.LoopMover: Attempting loop building: 7 …
        protocol.loops.LoopMover: Attempting loop building: 8 …
        protocol.loops.LoopMover: Attempting loop building: 9 …
        protocol.loops.LoopMover: Attempting loop building: 10 …
        protocol.loops.LoopMover: Attempting loop building: 11 …
        protocol.loops.LoopMover: Attempting loop building: 12 …
        protocol.loops.LoopMover: Attempting loop building: 13 …
        protocol.loops.LoopMover: Attempting loop building: 14 …
        protocol.loops.LoopMover: Attempting loop building: 15 …
        protocol.loops.LoopMover: Attempting loop building: 16 …
        protocol.loops.LoopMover: Attempting loop building: 17 …
        protocol.loops.LoopMover: Attempting loop building: 18 …
        protocol.loops.LoopMover: Attempting loop building: 19 …
        protocol.loops.LoopMover: Attempting loop building: 20 …
        protocol.loops.LoopMover: Attempting loop building: 21 …
        protocol.loops.LoopMover: Attempting loop building: 22 …
        protocol.loops.LoopMover: Attempting loop building: 23 …
        protocol.loops.LoopMover: Attempting loop building: 24 …
        protocol.loops.LoopMover: initial kinematic perturbation complete
        protocol.loops.LoopMover: loop rmsd after initial kinematic perturbation:7.65966
        protocols.moves.MonteCarlo: MonteCarlo:: last_accepted_score,lowest_score: -3.74435 -3.74435
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.66692
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.64564
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.6479
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.60719
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.60153
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.63606
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.63672
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.63659
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.63877
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.63516
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.63873
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.61386
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.59655
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.59749
        ~~~~

        protocol.loops.LoopMover: new centroid perturb rmsd: 7.75632
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.76036
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.74645
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.76038
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.76153
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.76194
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.75659
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.76038
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.77386
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.76188
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.76014
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.75756
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.75578
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.75525
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.74483
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.75071
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.74632
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.82138
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.81393
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.77752
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.81771
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.79596
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.80156
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.80308
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.79805
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.81848
        protocol.loops.LoopMover: new centroid perturb rmsd: 7.76296
        protocol.loops.LoopMover: new centroid perturb rm

        thanks!

      • #8573
        Anonymous

          Seg faults alone are unfortunately useless as diagnostic tools. Can you make the debug build (mode=debug when compiling) and see what it returns? We may have to go to GDB.

          Almost all segfaults are ultimately caused by bad indices – looking for atoms or residues that don’t exist. Is your loop file in PDB numbering or indexed-from-1 numbering? Does you PDB file contain anything other than protein that we need to be careful of? Do all residues in your PDB have all 4 backbone heavyatoms (N CA C O) present, and with nonzero occupancies (next to last PDB column)? Do you have any weird residues that might not have a proper centroid residue type present (post-translational modifications, noncanonicals, etc)?

          Finally, for reproducible crashes, you’ll get a better log if you run directly to terminal instead of catching with rosetta > log. The to-log output is buffered in 32 KB blocks (system-dependent), so the last 32 KB of output is lost on crash. Direct to terminal output is not buffered and sometimes carries more information.

        • #8597
          Anonymous

            hi

            my pdb is numbered from 1 ~ . it has two chains and the residues are numbered continuously so that no two locations has the same residue number in the different chains. my pdb only contains the two proteins with all natural AAs, and they all have the heavy atoms with nonzero occupancy.

            i tried your suggestion of running the debug mode and it has not crashed yet, could you explain why the normal mode would crash and the debug dont? and should i just run the debug mode even if it takes longer? also what is GDB that you speak of?

            thanks again for your help.

          • #8598
            Anonymous

              Going to debug mode is not expected to prevent a crash – it’s just that debug mode sometimes has more useful error messages, particularly when debug mode can exit with an assert() statement failure as opposed to a segfault. If the same inputs crash in release mode but not debug mode….that can happen, it’s generally due to something nasty happening on the compiler optimization steps (the optimization that makes “release” faster than “debug”). I’ve seen crashes of this type but were never able to fix them – going into the code to try to identify exactly where the crash occurs alters optimization and makes the bug go away.

              GDB is the gnu debugger. It lets you run code inside a “wrapper” that watches what the code is doing, line-by-line. If the code crashes, instead of dumping the memory, the debugger intercepts the crash and keeps the memory state alive to examine what the code was doing – you can know exactly what line it was on and what it did to crash. Usually you find out that some code was looking for residue 101 of a 100 residue protein, or a similar error. At minimum, if the crash occurs in debug mode, and you run in the debugger, you can issue the command “backtrace” to the debugger after the crash to get a list of exactly what line of code the crash occurred on, and what function called that line of code, etc, up through the whole stack.

            • #8600
              Anonymous

                Keep in mind that the segmentation fault could be contingent on the value of a certain variable, and thus may only manifest occasionally, due to the random number trajectory. You may want to repeat the debug runs multiple times with different random seeds to see if you can get one that triggers the crash.

                Unfortunately, due to differences because of optimization, etc., you can just reuse a release-mode seed in debug mode and expect to see the same trajectory. (You really can’t even expect to see the same trajectories for the same seed on different machines.

              • #8601
                Anonymous

                  ok so finally my debub mode run crashed with the following output:

                  ~~~
                  protocols::checkpoint: Deleting checkpoints of Loopbuild
                  protocols::loopbuild: loop_cenrms: 0
                  protocols::loopbuild: loop_rms: 0
                  protocols::loopbuild: total_energy: -808.297
                  protocols::loopbuild: chainbreak: 0.0927452
                  protocols.jobdist.JobDistributors: Looking for an available job: 192 1 S 192
                  core.scoring.ScoreFunctionFactory: SCOREFUNCTION: standard
                  core.scoring.ScoreFunctionFactory: SCOREFUNCTION PATCH: score12
                  protocols.looprelax: ==== Loop protocol: =================================================
                  protocols.looprelax: remodel perturb_kic
                  protocols.looprelax: intermedrelax no
                  protocols.looprelax: refine refine_kic
                  protocols.looprelax: relax fastrelax
                  protocols.looprelax: ====================================================================================
                  protocols.looprelax: ===
                  protocols.looprelax: === Remodel
                  protocols.looprelax: ===
                  protocol.loops.LoopMover: ALL_LOOPS:LOOP begin end cut skip_rate extended
                  protocol.loops.LoopMover: LOOP 32 41 36 0 1
                  protocol.loops.LoopMover: LOOP 45 49 47 0 1
                  protocol.loops.LoopMover:
                  protocol.loops.LoopMover: SELECTEDLOOPS:LOOP begin end cut skip_rate extended
                  protocol.loops.LoopMover: LOOP 32 41 36 0 1
                  protocol.loops.LoopMover: LOOP 45 49 47 0 1
                  protocol.loops.LoopMover:
                  protocols.loops.loops_main: Pose fold tree FOLD_TREE EDGE 1 30 -1 EDGE 30 36 -1 EDGE 30 43 1 EDGE 43 37 -1 EDGE 43 346 -1
                  protocols.loops.loops_main:
                  protocol.loops.LoopMover: Setting extended torsions: LOOP 32 41 36 0 1
                  protocol.loops.LoopMover: Building Loop: LOOP 32 41 36 0 1
                  protocol.loops.LoopMover: Building Loop attempt: 0
                  protocol.loops.LoopMover: perturb_one_loop_with_KIC: 32 10
                  protocol.loops.LoopMover: remodel init temp: 2
                  protocol.loops.LoopMover: remodel final temp: 1
                  protocol.loops.LoopMover: kinematic initial perturb with start_res: 32 middle res: 36 end_res: 41
                  protocol.loops.LoopMover: loop rmsd before initial kinematic perturbation:0
                  protocol.loops.LoopMover: Attempting loop building: 0 …
                  protocol.loops.LoopMover: Attempting loop building: 1 …
                  Segmentation fault: 11

                  so the cause seems memory relevant but i have no clue why this is happening or how to fix it. thanks.

                • #8602
                  Anonymous

                    A) Wow, you got a segfault in debug mode!

                    B) It looks like it’s on the 192 model? Not the first? Can you confirm that?

                    C) Is the crash reproducible in release mode? Does it always fail in exactly the same place (when tested with -constant_seed)? If the crash is NOT reproducible, then we should consider hardware errors (bad RAM).

                  • #8604
                    Anonymous

                      A) yes i did
                      B) yes it crashed at the 192th model, so all my seg faults have been happening after rosetta outputs couple successful models, sometimes in the 10s and some times in the 100s but definitely below 300.
                      C) i dont know, i guess i need to try couple release runs with a constant seed? i will write the result here after i try it. thanks.

                    • #8606
                      Anonymous

                        C) Yes, or you can use a random seed that failed quickly in an earlier test – look at the top of the log file from a test that failed in the 10s, if you have one. -jran lets you pass in a desired RNG seed.

                    Viewing 7 reply threads
                    • You must be logged in to reply to this topic.