Rosetta paralle running

Member Site Forums Rosetta 3 Rosetta 3 – Build/Install Rosetta paralle running

Viewing 1 reply thread
  • Author
    Posts
    • #2966
      Anonymous

        Dear Rosetta people,

        I have been installed rosetta3.9 on our local cluster. To do that, I have used the following command to compile the Rosetta and enable mpi ability after I copy the $ROSETTA_PATH/main/source/tools/build/site.settings.topsail file to the $ROSETTA_PATH/main/source/tools/build/site.settings:

        ./scons.py mode=release extras=mpi bin -j 20

        compilation have been done seccussfully without any error massage. I solve a problem using the relax.mpi.linuxgccrelease executable successfully, but there is an ambiguity when I use the rosetta_scripts.mpi.linuxgccrelease one.

        I run the following command:

        mpiexec -np 10 rosetta_scripts.mpi.linuxgccrelease @Rosetta_flags -mpi_tracer_to_file logdir

        When -nstruct is set to 100, ten output files are created seems that each of them contains output of one processor, and 10 jobs are assigned to each processor. Running is terminated when the first processor finishes 10 assigned job, however any other processors do not complete their jobs. It can be seen the following massage at the end of the output file of the processor that completes the 10 assigned jobs:

        protocols.jd2.JobDistributor: (6) 100 jobs considered, 10 jobs attempted in 1029 seconds

        Error: (6) [ ERROR ] Exception caught by rosetta_scripts application:

        File: src/protocols/jd2/JobDistributor.cc:329

        10 jobs failed; check output for error messages

        Error: (6) [ ERROR ]

        However, the last few lines of other output is different. For example:

        protocols.docking.DockingLowRes: (7) ////////////////////////////////////////////////////////////////////////////////

        protocols.docking.DockingLowRes: (7) ///                       Docking Low Res Protocol                           ///

        protocols.docking.DockingLowRes: (7) ///                                                                          ///

        protocols.docking.DockingLowRes: (7) /// Centroid Inner Cycles: 50                                                ///

        protocols.docking.DockingLowRes: (7) /// Centroid Outer Cycles: 10                                                ///

        protocols.docking.DockingLowRes: (7) /// Scorefunction:                                                           ///

        protocols.docking.DockingLowRes: (7) ScoreFunction::show():

        weights: (interchain_pair 1) (interchain_vdw 1) (interchain_env 1) (interchain_contact 2) (backbone_stub_linear_constraint 10)

        energy_method_options: EnergyMethodOptions::show: aa_composition_setup_files:

        or,

        protocols.docking.DockingLowRes: (0) EnergyMethodOptions::show: voids_penalty_energy_voxel_grid_padding_: 1

        protocols.docking.DockingLowRes: (0) EnergyMethodOptions::show: voids_penalty_energy_voxel_size_: 0.5

        protocols.docking.DockingLowRes: (0) EnergyMethodOptions::show: voids_penalty_energy_disabled_except_during_packing_: TRUE

        protocols.docking.DockingLowRes: (0) EnergyMethodOptions::show: hbnet_bonus_ramping_function_: “quadratic”

        I expect that, each processor should complete every 10 assigned jobs and the output of all files should be the same. Also, when -nstruct is set to 1000 non of the processors complete their assigned job and run is terminated suddenly without any error massage.

        May I ask you how can I solve the problem?

        Best Regards

        Bahareh Bamdad

         

         

         

         

         

      • #14435
        Anonymous

          For runs terminated by that particular error message (“File: src/protocols/jd2/JobDistributor.cc:329”) you should be able to add the option `-jd2:failed_job_exception false` to the command line to keep Rosetta from exiting if any of the jobs failed.  (Though for MPI it should be that condition shouldn’t trigger.)

      Viewing 1 reply thread
      • You must be logged in to reply to this topic.