mpi installation issues

Member Site Forums Rosetta 3 Rosetta 3 – Build/Install mpi installation issues

Viewing 1 reply thread
  • Author
    Posts
    • #2518
      Anonymous

        hi all

        i am trying to build mpi version of rosetta 3.7 on a machine with redhat linux (I have no problem installing on Ubuntu). I copied the topsail site.settings file and commented out the path to INCLUDE because initially rosetta complained about no INCLUDE parameter.

        I compile by using scons(scons -j8 bin mode=release extras=mpi) and the compiling ends fine, no error, no stalling, but when i execute the executables (for instance fixbb) i get the  message below:

        fixbb.linuxgccrelease: route/tc.c:973: rtnl_tc_register: Assertion `0′ failed.

        fixbb.linuxgccrelease:8222 terminated with signal 6 at PC=2af1e093a5f7 SP=7fff24ad2568.  Backtrace:

        /lib64/libc.so.6(gsignal+0x37)[0x2af1e093a5f7]

        /lib64/libc.so.6(abort+0x148)[0x2af1e093bce8]

        /lib64/libc.so.6(+0x2e566)[0x2af1e0933566]

        /lib64/libc.so.6(+0x2e612)[0x2af1e0933612]

        /lib64/libnl-route-3.so.200(+0x21249)[0x2af1e6a62249]

        /lib64/ld-linux-x86-64.so.2(+0xf3a3)[0x2af1d17163a3]

        /lib64/ld-linux-x86-64.so.2(+0x13ab6)[0x2af1d171aab6]

        /lib64/ld-linux-x86-64.so.2(+0xf1b4)[0x2af1d17161b4]

        /lib64/ld-linux-x86-64.so.2(+0x131ab)[0x2af1d171a1ab]

        /lib64/libdl.so.2(+0x102b)[0x2af1e0ecf02b]

        /lib64/ld-linux-x86-64.so.2(+0xf1b4)[0x2af1d17161b4]

        /lib64/libdl.so.2(+0x162d)[0x2af1e0ecf62d]

        /lib64/libdl.so.2(dlopen+0x31)[0x2af1e0ecf0c1]

        /usr/lib64/openmpi/lib/libopen-pal.so.13(+0x58f34)[0x2af1e13a6f34]

        /usr/lib64/openmpi/lib/libopen-pal.so.13(+0x3b891)[0x2af1e1389891]

        /usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_component_find+0x78a)[0x2af1e138ae0a]

        /usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_framework_components_register+0x56)[0x2af1e1394d46]

        /usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_framework_register+0x196)[0x2af1e13951f6]

        /usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_framework_open+0x12)[0x2af1e1395252]

        /usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_framework_open+0x66)[0x2af1e13952a6]

        /usr/lib64/openmpi/lib/libmpi.so.12(ompi_mpi_init+0x476)[0x2af1dfc2a2f6]

        /usr/lib64/openmpi/lib/libmpi.so.12(MPI_Init+0x193)[0x2af1dfc4c4e3]

        /home/enztech/rosetta_bin_linux_2016.32.58837_bundle/main/source/build/src/release/linux/3.10/64/x86/gcc/4.8/mpi/libcore.5.so(_ZN4core4init8init_mpiEiPPc+0x33)[0x2af1da00c313]

        /home/enztech/rosetta_bin_linux_2016.32.58837_bundle/main/source/build/src/release/linux/3.10/64/x86/gcc/4.8/mpi/libcore.5.so(_ZN4core4init4initEiPPc+0x1a)[0x2af1da00ef1a]

        ./fixbb.linuxgccrelease[0x40c53e]

        /lib64/libc.so.6(__libc_start_main+0xf5)[0x2af1e0926b15]

        ./fixbb.linuxgccrelease[0x40d4b1]

        fixbb.linuxgccrelease:8222 terminated with signal 11 at PC=2af1e6a62258 SP=7fff24ad1bf8.  Backtrace:

        /lib64/libnl-route-3.so.200(rtnl_tc_unregister+0x8)[0x2af1e6a62258]

        /lib64/ld-linux-x86-64.so.2(+0xfa1a)[0x2af1d1716a1a]

        /lib64/libc.so.6(+0x38e69)[0x2af1e093de69]

        /lib64/libc.so.6(+0x38eb5)[0x2af1e093deb5]

        /lib64/libinfinipath.so.4(+0x426f)[0x2af1e6eea26f]

        /lib64/libpthread.so.0(+0xf100)[0x2af1e06f8100]

        /lib64/libc.so.6(gsignal+0x37)[0x2af1e093a5f7]

        /lib64/libc.so.6(abort+0x148)[0x2af1e093bce8]

        /lib64/libc.so.6(+0x2e566)[0x2af1e0933566]

        /lib64/libc.so.6(+0x2e612)[0x2af1e0933612]

        /lib64/libnl-route-3.so.200(+0x21249)[0x2af1e6a62249]

        /lib64/ld-linux-x86-64.so.2(+0xf3a3)[0x2af1d17163a3]

        /lib64/ld-linux-x86-64.so.2(+0x13ab6)[0x2af1d171aab6]

        /lib64/ld-linux-x86-64.so.2(+0xf1b4)[0x2af1d17161b4]

        /lib64/ld-linux-x86-64.so.2(+0x131ab)[0x2af1d171a1ab]

        /lib64/libdl.so.2(+0x102b)[0x2af1e0ecf02b]

        /lib64/ld-linux-x86-64.so.2(+0xf1b4)[0x2af1d17161b4]

        /lib64/libdl.so.2(+0x162d)[0x2af1e0ecf62d]

        /lib64/libdl.so.2(dlopen+0x31)[0x2af1e0ecf0c1]

        /usr/lib64/openmpi/lib/libopen-pal.so.13(+0x58f34)[0x2af1e13a6f34]

        /usr/lib64/openmpi/lib/libopen-pal.so.13(+0x3b891)[0x2af1e1389891]

        /usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_component_find+0x78a)[0x2af1e138ae0a]

        /usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_framework_components_register+0x56)[0x2af1e1394d46]

        /usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_framework_register+0x196)[0x2af1e13951f6]

        /usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_framework_open+0x12)[0x2af1e1395252]

        /usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_framework_open+0x66)[0x2af1e13952a6]

        /usr/lib64/openmpi/lib/libmpi.so.12(ompi_mpi_init+0x476)[0x2af1dfc2a2f6]

        /usr/lib64/openmpi/lib/libmpi.so.12(MPI_Init+0x193)[0x2af1dfc4c4e3]

        /home/enztech/rosetta_bin_linux_2016.32.58837_bundle/main/source/build/src/release/linux/3.10/64/x86/gcc/4.8/mpi/libcore.5.so(_ZN4core4init8init_mpiEiPPc+0x33)[0x2af1da00c313]

        /home/enztech/rosetta_bin_linux_2016.32.58837_bundle/main/source/build/src/release/linux/3.10/64/x86/gcc/4.8/mpi/libcore.5.so(_ZN4core4init4initEiPPc+0x1a)[0x2af1da00ef1a]

        ./fixbb.linuxgccrelease[0x40c53e]

        /lib64/libc.so.6(__libc_start_main+0xf5)[0x2af1e0926b15]

        ./fixbb.linuxgccrelease[0x40d4b1]

        this happens whether i use the mpi version or just the normal version.

        could someone help me out with this? thank you very much

      • #11917
        Anonymous

          “this happens whether i use the mpi version or just the normal version.”

          Normal version = fixbb.default.linuxgccrelease, or just fixbb.linuxgccrelease?  If it’s the latter it’s probably symlinked to the mpi version.  If it’s the former it’s very odd you’re getting an mpi error without mpi compiling.  The one with no intervening .default. or .mpi. is symlinked to whatever was last built.  You probably know that but I wrote version 1 of this comment without checking the question’s author.

          Broadly speaking, I have no idea what the error is – it looks like it’s failing in an MPI library during Rosetta initialization, so all I can suggest is to try a different version of openmpi and/or ensure that all the mpirun binaries, mpi build libraries, compilers, etc are all cross-compatible.  Do any other MPI tools work on the machine?  Could it be some sort of permissions issue with the mpi communication channels?

        • #11920
          Anonymous

            thanks for your reply. i got the wording confused, I meant the static mpi version not the normal version. 

            i read another question on the forum complaining of what it looks like to me a similar, but not the same, library related errors when running unit test with mpi. you mentioned that it could be the version of openmpi or the compiler is more recent than what was tested with rosetta. could that also be the case here? you said

            GCC: 4.8.3, Open MPI: 1.6.4 was what was tested. i checked the version on my computer and it was gcc 4.8.5 and openmpi 3.0.2

            thanks steven!

          • #12441
            Anonymous

              thanks for your reply. i got the wording confused, I meant the static mpi version not the normal version. 

              i read another question on the forum complaining of what it looks like to me a similar, but not the same, library related errors when running unit test with mpi. you mentioned that it could be the version of openmpi or the compiler is more recent than what was tested with rosetta. could that also be the case here? you said

              GCC: 4.8.3, Open MPI: 1.6.4 was what was tested. i checked the version on my computer and it was gcc 4.8.5 and openmpi 3.0.2

              thanks steven!

            • #12962
              Anonymous

                thanks for your reply. i got the wording confused, I meant the static mpi version not the normal version. 

                i read another question on the forum complaining of what it looks like to me a similar, but not the same, library related errors when running unit test with mpi. you mentioned that it could be the version of openmpi or the compiler is more recent than what was tested with rosetta. could that also be the case here? you said

                GCC: 4.8.3, Open MPI: 1.6.4 was what was tested. i checked the version on my computer and it was gcc 4.8.5 and openmpi 3.0.2

                thanks steven!

              • #11923
                Anonymous

                  3.7 was before the Cxx11 changeover (we switched just after 3.7) so knowing what I have now is less useful than it might be.  I am using gcc 5.4.0 and openmpi 1.10.2.  The openmpi web site (https://www.open-mpi.org/) does not suggest to me that their versions yet go as high as 3 (I’m guessing it’s a package renumbering from the linux distro’s package manager).  

                  Googling around shows similar-looking errors due to ???? somewhere in MPI (https://www.mail-archive.com/devel@lists.open-mpi.org/msg18181.html) – not that I know what to do with that data.  The most interesting thing from that email thread is 


                  The main change appears to be a switch from a MOFED-based install to the
                  OFED packaged with RHEL7.

                   

                  That suggests to me that the package you are getting from Red Hat is bad; maybe try (shudder, this NEVER works) building mpi yourself?  (We just had a long thread with someone who’d built it themselves and the solution was “use the package instead”, I think….)   Maybe Red Hat can give you a different version of the openmpi package?

                • #12444
                  Anonymous

                    3.7 was before the Cxx11 changeover (we switched just after 3.7) so knowing what I have now is less useful than it might be.  I am using gcc 5.4.0 and openmpi 1.10.2.  The openmpi web site (https://www.open-mpi.org/) does not suggest to me that their versions yet go as high as 3 (I’m guessing it’s a package renumbering from the linux distro’s package manager).  

                    Googling around shows similar-looking errors due to ???? somewhere in MPI (https://www.mail-archive.com/devel@lists.open-mpi.org/msg18181.html) – not that I know what to do with that data.  The most interesting thing from that email thread is 


                    The main change appears to be a switch from a MOFED-based install to the
                    OFED packaged with RHEL7.

                     

                    That suggests to me that the package you are getting from Red Hat is bad; maybe try (shudder, this NEVER works) building mpi yourself?  (We just had a long thread with someone who’d built it themselves and the solution was “use the package instead”, I think….)   Maybe Red Hat can give you a different version of the openmpi package?

                  • #12965
                    Anonymous

                      3.7 was before the Cxx11 changeover (we switched just after 3.7) so knowing what I have now is less useful than it might be.  I am using gcc 5.4.0 and openmpi 1.10.2.  The openmpi web site (https://www.open-mpi.org/) does not suggest to me that their versions yet go as high as 3 (I’m guessing it’s a package renumbering from the linux distro’s package manager).  

                      Googling around shows similar-looking errors due to ???? somewhere in MPI (https://www.mail-archive.com/devel@lists.open-mpi.org/msg18181.html) – not that I know what to do with that data.  The most interesting thing from that email thread is 


                      The main change appears to be a switch from a MOFED-based install to the
                      OFED packaged with RHEL7.

                       

                      That suggests to me that the package you are getting from Red Hat is bad; maybe try (shudder, this NEVER works) building mpi yourself?  (We just had a long thread with someone who’d built it themselves and the solution was “use the package instead”, I think….)   Maybe Red Hat can give you a different version of the openmpi package?

                  Viewing 1 reply thread
                  • You must be logged in to reply to this topic.