How to perform abinitiorelax.mpi.linuxgccrelease in parallel mode with MPI

Member Site Forums Rosetta 3 Rosetta 3 – Applications How to perform abinitiorelax.mpi.linuxgccrelease in parallel mode with MPI

Viewing 2 reply threads
  • Author
    Posts
    • #2745
      Anonymous

        Hi,

        I have compiled Rosetta3.8 successfully with the command ” $./scons.py bin mode=release extras=mpi cxx=icc”. But when I performed the command “mpirun -np 64 AbinitioRelax.mpi.linuxiccrelease @options”, the error will be displayed in the end like this:

        =========================================================================

        ===

        =    BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES

        =     PID XXXXX RUNING AT LOCALHOST.LOCALDOMAIN

        =     EXIT CODE:11

        =      CLEANING UP REMAINING PROCESSES

        =      YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

        ===========================================================================

         

        Would you help me to solve this problem, many thanks!

        Kindest regards

         

        Jiyuan

      • #13788
        Anonymous

          That message is from the MPI system, and is basically reporting that whatever program you’re running under MPI has encountered a problem and so the MPI system is shutting the whole thing down. The `EXIT CODE:11` in the message indicates that this is likely a SegFault in the Rosetta code.

          To track this down, it will help to take a closer look at the Rosetta output. What is being printed to the Tracers? Do you get any results, or does the program terminate with this issue immediately?

          SegFaults are a bit tricky to track down. It’s often necessary to recompile Rosetta in debug mode (with `mode=debug` on the scons commandline), and then re-run things. Debug mode has extra checks which help with debugging, but slow things down. If possible, try running the same command on a single processor without MPI, and see if you can prompt an error message that way. (Without MPI the results will be more interpretable.) If you need MPI in order to provoke the behavior, use the `-mpi_tracer_to_file` option (e.g. `-mpi_tracer_to_file tracer.log`) to make the output more interpretable.

          Hopefully you’ll get some sort of error message or traceback from the debug build that will help us track down the issue. (If you’re still getting a SegFault and no message with the debug build, you may need to run it under a debugger to get a backtrace.)

           

          Another thing you probably want to do before going through the hassle of recompiling is to double-check all your input files. SegFaults normally pop up when input files don’t quite conform to the format which Rosetta expects.

        • #13792
          Anonymous

            Dear rmoretti,

            Thank you so much for your reply. I have recompiled rosetta3.8 as your instructions. After I performed the command “mpirun -np 64 AbinitioRelax.mpi.linuxiccrelease -mpi_tracer_to_file tracer.log @options”, and I can’t get any results in the input_files.The error was still displayed:

            =========================================================================

            ===

            =    BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES

            =     PID XXXXX RUNING AT LOCALHOST.LOCALDOMAIN

            =     EXIT CODE:11

            =      CLEANING UP REMAINING PROCESSES

            =      YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

            ===========================================================================

            and the content of the file tracer.log is :


            core.init: (0) Rosetta version unknown:exported  from http://www.rosettacommons.org

            core.init: (0) command: AbinitioRelax.mpi.linuxiccrelease -mpi_tracer_to_file tracer.log @options

            core.init: (0) ‘RNG device’ seed mode, using ‘/dev/urandom’, seed=-1822093000 seed_offset=0 real_seed=-1822093000

            core.init.random: (0) RandomGenerator:init: Normal mode, seed=-1822093000 RG_type=mt19937

            core.init: (0) Resolved executable path: /home/Rosetta/rosetta_bin_linux_2017.08.59291_bundle/main/source/build/src/release/linux/3.10/64/x86/icc/16.0/mpi/AbinitioRelax.mpi.linuxiccrelease

            core.init: (0) Looking for database based on location of executable: /home/Rosetta/rosetta_bin_linux_2017.08.59291_bundle/main/database/

            protocols.abinitio.AbrelaxApplication: (0) read fasta sequence: 225 residues

            RPVFEREIYTAGIYETDTSNRELLTVHATHTEGLDITYTMDLDTMVVDPSLEGVRESAFT    

            LHPSSGVLSLNMNPLDTMVGMFEFDVVATDTRGAEARTDVKIYLITHLNRVYFLFNNTL

            DVVDSNRAFIADTFSSVFSLTCNIDAVLRAPDSSGAARDDRTEVRAHFIRNHVPATTDEI

            EQLRSNTILLRAIQETLLTRELHLEDFVGGSSPELGVDNSLT

            protocols.evaluation.ChiWellRmsdEvaluatorCreator: (0) Evaluation Creator active …

            core.chemical.GlobalResidueTypeSet: (0) For ResidueTypeSet centroid there is no shadow_list.txt file to list known PDB ids.

            core.chemical.GlobalResidueTypeSet: (0)     This will turn off PDB component loading for ResidueTypeSet centroid

            core.chemical.GlobalResidueTypeSet: (0)     Expected file: /home/Rosetta/rosetta_bin_linux_2017.08.59291_bundle/main/database/chemical/residue_type_sets/centroid/shadow_list.txt

            core.chemical.GlobalResidueTypeSet: (0) Finished initializing centroid residue type set.  Created 62 residue types

            core.chemical.GlobalResidueTypeSet: (0) Total time to initialize 0.05 seconds.

            core.io.fragments: (0) reading fragments from file: aat000_09_05.200_v1_3 …

            core.io.fragments: (0) rosetta++ fileformat detected! Calling legacy reader…

            core.fragments.ConstantLengthFragSet: (0) finished reading top 25 9mer fragments from file aat000_09_05.200_v1_3

            core.io.fragments: (0) reading fragments from file: aat000_03_05.200_v1_3 …

            core.io.fragments: (0) rosetta++ fileformat detected! Calling legacy reader…

            core.fragments.ConstantLengthFragSet: (0) finished reading top 200 3mer fragments from file aat000_03_05.200_v1_3

            core.fragment: (0) compute strand/loop fractions for 221 residues…

            protocols.abinitio.AbrelaxApplication: (0) run ClassicAbinitio…..

            basic.io.database: (0) Database file opened: scoring/score_functions/EnvPairPotential/env_log.txt

            basic.io.database: (0) Database file opened: scoring/score_functions/EnvPairPotential/cbeta_den.txt

            basic.io.database: (0) Database file opened: scoring/score_functions/EnvPairPotential/pair_log.txt

            basic.io.database: (0) Database file opened: scoring/score_functions/EnvPairPotential/cenpack_log.txt

            basic.io.database: (0) Database file opened: scoring/score_functions/SecondaryStructurePotential/phi.theta.36.HS.resmooth

            basic.io.database: (0) Database file opened: scoring/score_functions/SecondaryStructurePotential/phi.theta.36.SS.resmooth

            core.scoring: (0) ATOM_VDW set to CENTROID_ROT_MIN

            basic.io.database: (0) Database file opened: scoring/score_functions/hbonds/sp2_elec_params/HBPoly1D.csv

            basic.io.database: (0) Database file opened: scoring/score_functions/hbonds/sp2_elec_params/HBFadeIntervals.csv

            basic.io.database: (0) Database file opened: scoring/score_functions/hbonds/sp2_elec_params/HBEval.csv

            basic.io.database: (0) Database file opened: scoring/score_functions/rama/Rama_smooth_dyn.dat_ss_6.4

            basic.io.database: (0) Database file opened: scoring/score_functions/centroid_smooth/cen_rot_pair_params.txt

            basic.io.database: (0) Database file opened: scoring/score_functions/centroid_smooth/cen_rot_env_params.txt

            basic.io.database: (0) Database file opened: scoring/score_functions/centroid_smooth/cen_rot_cbeta_params.txt

            basic.io.database: (0) Database file opened: scoring/score_functions/centroid_smooth/cen_rot_pair_ang_params.txt

            core.scoring.AtomVDW: (0) Openning alternative vdw file: /home/Rosetta/rosetta_bin_linux_2017.08.59291_bundle/main/database/chemical/atom_type_sets/centroid_rot//min.txt

            core.scoring: (0) ATOM_VDW set to CENTROID_ROT_MIN

            core.scoring.ScoreFunctionFactory: (0) SCOREFUNCTION: talaris2014

            core.scoring.etable: (0) Starting energy table calculation

            core.scoring.etable: (0) smooth_etable: changing atr/rep split to bottom of energy well

            core.scoring.etable: (0) smooth_etable: spline smoothing lj etables (maxdis = 6)

            core.scoring.etable: (0) smooth_etable: spline smoothing solvation etables (max_dis = 6)

            core.scoring.etable: (0) Finished calculating energy tables.

            basic.io.database: (0) Database file opened: scoring/score_functions/P_AA_pp/P_AA

            basic.io.database: (0) Database file opened: scoring/score_functions/P_AA_pp/P_AA_n

            basic.io.database: (0) Database file opened: scoring/score_functions/P_AA_pp/P_AA_pp

            protocols.jobdist.JobDistributors: (0) Node: 0 next_job()

            protocols.jobdist.JobDistributors: (0) Master Node — Waiting for job request; tag_ = 1

            protocols.jobdist.JobDistributors: (0) Looking for an available job: 1 1  1

            protocols.jobdist.JobDistributors: (0) Master Node –available job? 1

            protocols.jobdist.JobDistributors: (0) Master Node — Assigning job 1 1 to node 2

            protocols.jobdist.JobDistributors: (0) Master Node — Waiting for job request; tag_ = 1

            protocols.jobdist.JobDistributors: (0) Looking for an available job: 2 1  2

            protocols.jobdist.JobDistributors: (0) Master Node –available job? 1

            protocols.jobdist.JobDistributors: (0) Master Node — Assigning job 1 2 to node 4

            protocols.jobdist.JobDistributors: (0) Master Node — Waiting for job request; tag_ = 1

            protocols.jobdist.JobDistributors: (0) Looking for an available job: 3 1  3

            protocols.jobdist.JobDistributors: (0) Master Node –available job? 1

            protocols.jobdist.JobDistributors: (0) Master Node — Assigning job 1 3 to node 5

            protocols.jobdist.JobDistributors: (0) Master Node — Waiting for job request; tag_ = 1

            protocols.jobdist.JobDistributors: (0) Looking for an available job: 4 1  4

            protocols.jobdist.JobDistributors: (0) Master Node –available job? 1

            protocols.jobdist.JobDistributors: (0) Master Node — Assigning job 1 4 to node 6

            protocols.jobdist.JobDistributors: (0) Master Node — Waiting for job request; tag_ = 1

            protocols.jobdist.JobDistributors: (0) Looking for an available job: 5 1  5

            protocols.jobdist.JobDistributors: (0) Master Node –available job? 1

            protocols.jobdist.JobDistributors: (0) Master Node — Assigning job 1 5 to node 7

            protocols.jobdist.JobDistributors: (0) Master Node — Waiting for job request; tag_ = 1

            protocols.jobdist.JobDistributors: (0) Looking for an available job: 6 1  6

            protocols.jobdist.JobDistributors: (0) Master Node –available job? 1

            protocols.jobdist.JobDistributors: (0) Master Node — Assigning job 1 6 to node 12

            protocols.jobdist.JobDistributors: (0) Master Node — Waiting for job request; tag_ = 1

            protocols.jobdist.JobDistributors: (0) Looking for an available job: 7 1  7

            protocols.jobdist.JobDistributors: (0) Master Node –available job? 1

            protocols.jobdist.JobDistributors: (0) Master Node — Assigning job 1 7 to node 17

            protocols.jobdist.JobDistributors: (0) Master Node — Waiting for job request; tag_ = 1

            protocols.jobdist.JobDistributors: (0) Looking for an available job: 8 1  8

            protocols.jobdist.JobDistributors: (0) Master Node –available job? 1

            protocols.jobdist.JobDistributors: (0) Master Node — Assigning job 1 8 to node 18

            protocols.jobdist.JobDistributors: (0) Master Node — Waiting for job request; tag_ = 1

            protocols.jobdist.JobDistributors: (0) Looking for an available job: 9 1  9

            protocols.jobdist.JobDistributors: (0) Master Node –available job? 1

            protocols.jobdist.JobDistributors: (0) Master Node — Assigning job 1 9 to node 19

            protocols.jobdist.JobDistributors: (0) Master Node — Waiting for job request; tag_ = 1

            protocols.jobdist.JobDistributors: (0) Looking for an available job: 10 1  10

            protocols.jobdist.JobDistributors: (0) Master Node –available job? 1

            protocols.jobdist.JobDistributors: (0) Master Node — Assigning job 1 10 to node 26

            protocols.jobdist.JobDistributors: (0) Master Node — Waiting for job request; tag_ = 1

            protocols.jobdist.JobDistributors: (0) Looking for an available job: 11 1  11


            Would you find any solution for this error.

             

            Jiyuan

             

             

            • #13804
              Anonymous

                It looks like you’re still running the release-mode application (AbinitioRelax.mpi.linuxiccrelease). You need to change to the debug-mode application that you compiled (AbinitioRelax.mpi.linuxiccdebug) in order to get the extra debug-mode tests.

                Also, when using  -mpi_tracer_to_file, be sure to examine all the various log files produced (there should be one for each CPU), not just the main one.

          Viewing 2 reply threads
          • You must be logged in to reply to this topic.