Rosetta3.2.1-LAM-MPI Run-Problem-More than 2 processor Jobs

Member Site Forums Rosetta 3 Rosetta 3 – Build/Install Rosetta3.2.1-LAM-MPI Run-Problem-More than 2 processor Jobs

  • This topic has 10 replies, 3 voices, and was last updated 13 years ago by Anonymous.
Viewing 3 reply threads
  • Author
    Posts
    • #919
      Anonymous

        Hi All:

        I am a new Rosetta user. I had just finished compiling parallel Rosetta (rosetta3.2.1/gcc-4.4.3/LAM-MPI-7.1.4)
        and can run 2 processor jobs with no issues. But jobs fail for more than two processors (error message below).
        Any help would be greatly appreciated.

        Thanks

        Ravi

        Linux System:



        Linux node2n29 2.6.32-29-server #58-Ubuntu SMP Fri Feb 11 21:06:51 UTC 2011 x86_64 GNU/Linux

        Parallel version of Rosetta was compiled using GCC/LAM-MPI-7.1.4

        Run Command (memory used was 8 GB)



        bin/AbinitioRelax.mpi.linuxgccrelease @flags

        —-flags



        -in:file:native inputs/1l2y.pdb
        -in:file:frag3 inputs/aa1l2yA03_05.200_v1_3
        -in:file:frag9 inputs/aa1l2yA09_05.200_v1_3
        -out:nstruct 1
        -out:file:silent 1l2y_silent.out
        -no_prof_info_in_silentout
        -mute core.io.database
        -run:constant_seed
        -run:jran 1111111
        -database /opt/nasapps/build/Rosetta/rosetta_database

        Error Message:


        ……………………………..
        Stage 2
        Folding with score1 for 2000


        One of the processes started by mpirun has exited with a nonzero exit
        code. This typically indicates that the process finished in error.
        If your process did not finish in error, be sure to include a “return
        0″ or “exit(0)” in your C code before exiting the application.

        PID 11933 failed on node n0 (129.43.63.71) due to signal 11.

      • #5626
        Anonymous

          A) The Abinitio executeable is non-obviously not MPI compatible. It shouldn’t crash, but it won’t actually work in MPI; it just runs concurrent non-MPI jobs (that overwrite each others’ output).

          B) I suspect it is crashing due to the filesystem getting angry at files overwriting each other, it isn’t giving me a Rosetta error message to work with.

          There is an abinitio MPI patch for 3.2 floating around. Would you like me to email it to the address you gave when you signed up for the message boards?

        • #5629
          Anonymous

            Hi Steven:

            I have applied the patch that you had sent me


            PATCH


            [ravi@torkv rosetta_source]$ ./scons.py bin mode=release extras=mpi
            scons: Reading SConscript files …
            svn: ‘.’ is not a working copy
            scons: done reading SConscript files.
            scons: Building targets …
            mpiCC -o build/src/release/linux/2.6/64/x86/gcc/mpi/apps/public/AbInitio_MPI.o -c -std=c++98 -pipe -ffor-scope -W -Wall -pedantic -Wno-long-long -O3 -ffast-math -funroll-loops -finline-functions -finline-limit=20000 -s -Wno-unused-variable -DNDEBUG -DUSEMPI -Isrc -Iexternal/include -Isrc/platform/linux/64/gcc -Isrc/platform/linux/64 -Isrc/platform/linux -Iexternal/boost_1_38_0 -I/usr/local/include -I/usr/include src/apps/public/AbInitio_MPI.cc
            mpiCC -o build/src/release/linux/2.6/64/x86/gcc/mpi/AbInitio_MPI.linuxgccrelease -Wl,-rpath=/opt/nasapps/build/Rosetta/rosetta_source/build/src/release/linux/2.6/64/x86/gcc/mpi build/src/release/linux/2.6/64/x86/gcc/mpi/apps/public/AbInitio_MPI.o -Llib -Lexternal/lib -Lbuild/src/release/linux/2.6/64/x86/gcc/mpi -Lsrc -L/usr/local/lib -L/usr/lib -L/lib -L/lib64 -ldevel -lprotocols -lcore -lnumeric -lutility -lObjexxFCL -lz
            Install file: “build/src/release/linux/2.6/64/x86/gcc/mpi/AbInitio_MPI.linuxgccrelease” as “bin/AbInitio_MPI.linuxgccrelease”
            mpiCC -o build/src/release/linux/2.6/64/x86/gcc/mpi/AbInitio_MPI.mpi.linuxgccrelease -Wl,-rpath=/opt/nasapps/build/Rosetta/rosetta_source/build/src/release/linux/2.6/64/x86/gcc/mpi build/src/release/linux/2.6/64/x86/gcc/mpi/apps/public/AbInitio_MPI.o -Llib -Lexternal/lib -Lbuild/src/release/linux/2.6/64/x86/gcc/mpi -Lsrc -L/usr/local/lib -L/usr/lib -L/lib -L/lib64 -ldevel -lprotocols -lcore -lnumeric -lutility -lObjexxFCL -lz
            Install file: “build/src/release/linux/2.6/64/x86/gcc/mpi/AbInitio_MPI.mpi.linuxgccrelease” as “bin/AbInitio_MPI.mpi.linuxgccrelease”
            scons: done building targets.



            MPIRUN with NP 2 WORKS FINE


            mpirun -np 2 $rosetta_home/bin/AbinitioRelax.mpi.linuxgccrelease @flags

            ……
            ……
            ……
            Total weighted score: 24.862

            ===================================================================
            Finished Abinitio

            protocols.abinitio.AbrelaxApplication: (1) Finished _0001 in 7 seconds.
            protocols::checkpoint: (1) Deleting checkpoints of ClassicAbinitio
            protocols::checkpoint: (1) Deleting checkpoints of Abrelax
            protocols.jobdist.JobDistributors: (1) Node: 1 next_job()
            protocols.jobdist.JobDistributors: (1) Slave Node 1 — requesting job from master node; tag_ 1
            protocols.jobdist.JobDistributors: (0) Master Node –available job? 0
            protocols.jobdist.JobDistributors: (0) Master Node — Spinning down node 1
            protocols.jobdist.JobDistributors: (0) Node 0 — ready to call mpi finalize
            protocols.jobdist.JobDistributors: (1) Node 1 — ready to call mpi finalize
            protocols::checkpoint: (0) Deleting checkpoints of ClassicAbinitio
            protocols::checkpoint: (0) Deleting checkpoints of Abrelax
            protocols::checkpoint: (1) Deleting checkpoints of ClassicAbinitio
            protocols::checkpoint: (1) Deleting checkpoints of Abrelax



            –MPIRUN with NP >2 FAILS


            ===================================================================
            Stage 2
            Folding with score1 for 2000


            One of the processes started by mpirun has exited with a nonzero exit
            code. This typically indicates that the process finished in error.
            If your process did not finish in error, be sure to include a “return
            0″ or “exit(0)” in your C code before exiting the application.

            PID 7798 failed on node n0 (129.43.63.50) due to signal 11.


            Could LAM-7.1.4 be an issue?

            Thanks

            Ravi

          • #5630
            Anonymous

              Steven:

              I forgot to mention that I am using Python version, python-2.7

              Thanks

            • #5627
              Anonymous

                Thanks for your reply. Yes, please email me the fix. Thanks.

              • #5628
                Anonymous

                  It’s on the way.

                • #6388
                  Anonymous

                    Hi Ravi, I am using 3.3 version and also looking forward to run Abinitiorelax function in mpi. Can you please forward me the patch on and tell me how I can do the run with that?

                    Thanks

                  • #5648
                    Anonymous

                      I hate taking this off the boards, but you and another user are reporting similar problems, I’ve emailed you both to try to figure out if there’s a shared root cause.

                    • #6390
                      Anonymous

                        on the way

                      • #6410
                        Anonymous

                          Hi Lewis, I have put the Abintion_mpi.cc file in the src/apps/public directory. I have opened the src/apps.src.settings file……

                          sources = {
                          “” : [],

                          “curated”: [],
                          “benchmark”: [ “benchmark” ],
                          “benchmark/scientific”: [
                          “design_contrast_and_statistic”,
                          “ddg_benchmark”,
                          “rotamer_recovery”,
                          ],
                          “public/bundle” : [ “minirosetta”, “minirosetta_graphics” ],
                          “public/ligand_docking” : [
                          “ligand_rpkmin”,
                          “ligand_dock”,
                          “extract_atomtree_diffs”,
                          ],
                          “public/docking” : [
                          “docking_protocol”,
                          “docking_prepack_protocol”,
                          ],
                          “public/flexpep_docking” : [ “FlexPepDocking” ], # /* Barak,doc/apps/public/flexpep_docking/barak/FlexPepDocking.dox, test/integration/tests/flexpepdock/ */
                          “public/enzdes” : [
                          “enzyme_design”,
                          “CstfileToTheozymePDB”
                          ],
                          “public/rosettaDNA” : [ “rosettaDNA” ],
                          “public/design” : [“fixbb”],
                          “public/loop_modeling” : [ “loopmodel” ],
                          “public/match” : [
                          “match”,
                          “gen_lig_grids”,
                          “gen_apo_grids”
                          ],
                          “public/membrane_abinitio” : [ “membrane_abinitio2” ],

                          “public/comparative_modeling” : [
                          “score_aln”,
                          “super_aln”,
                          “full_length_model”,
                          “cluster_alns”,
                          ],

                          “public/electron_density” : [
                          “mr_protocols”,
                          “loops_from_density”,
                          ],

                          “public” : [
                          “score_jd2”,
                          “relax”,
                          “idealize”,
                          “idealize_jd2”,
                          “cluster”,
                          “combine_silent”,
                          “extract_pdbs”,
                          “AbinitioRelax”,
                          “AbInitio_MPI”,
                          “backrub”,
                          “sequence_tolerance”,
                          “SymDock”
                          ],
                          “public/rosetta_scripts” : [
                          “rosetta_scripts”,
                          “revert_design_to_native”
                          ],
                          “public/scenarios” : [
                          “FloppyTail”, # /* Steven Lewis, doc/apps/public/scenarios/FloppyTail.dox, test/integration/tests/FloppyTail/ */
                          # “FloppyTailACAT”, # /* Barak Raveh */
                          “ca_to_allatom”, # /* Frank DiMaio, doc/apps/public/scenarios/ca_to_allatom.dox */
                          ],
                          }
                          include_path = [ ]
                          library_path = [ ]
                          libraries = [ ]
                          subprojects = [ “devel”, “protocols”, “core”, “numeric”, “utility”, “ObjexxFCL”, “z” ]

                          Now just wandering where to add a line as been mentioned to you for “Abinitio_mpi” compilation.

                          Does i have to need any extra flag after for building this along with other programs of the rosetta_source directory??

                        • #6414
                          Anonymous

                            You already did add it to the “public:” group.

                            Editing this file tells SCons to compile it, so you don’t need a flag, just recompile.

                        Viewing 3 reply threads
                        • You must be logged in to reply to this topic.