Member Site › Forums › Rosetta 3 › Rosetta 3 – Build/Install › mpi installation issues
- This topic has 7 replies, 2 voices, and was last updated 8 years, 2 months ago by Anonymous.
-
AuthorPosts
-
-
October 5, 2016 at 1:48 am #2518Anonymous
hi all
i am trying to build mpi version of rosetta 3.7 on a machine with redhat linux (I have no problem installing on Ubuntu). I copied the topsail site.settings file and commented out the path to INCLUDE because initially rosetta complained about no INCLUDE parameter.
I compile by using scons(scons -j8 bin mode=release extras=mpi) and the compiling ends fine, no error, no stalling, but when i execute the executables (for instance fixbb) i get the message below:
fixbb.linuxgccrelease: route/tc.c:973: rtnl_tc_register: Assertion `0′ failed.
fixbb.linuxgccrelease:8222 terminated with signal 6 at PC=2af1e093a5f7 SP=7fff24ad2568. Backtrace:
/lib64/libc.so.6(gsignal+0x37)[0x2af1e093a5f7]
/lib64/libc.so.6(abort+0x148)[0x2af1e093bce8]
/lib64/libc.so.6(+0x2e566)[0x2af1e0933566]
/lib64/libc.so.6(+0x2e612)[0x2af1e0933612]
/lib64/libnl-route-3.so.200(+0x21249)[0x2af1e6a62249]
/lib64/ld-linux-x86-64.so.2(+0xf3a3)[0x2af1d17163a3]
/lib64/ld-linux-x86-64.so.2(+0x13ab6)[0x2af1d171aab6]
/lib64/ld-linux-x86-64.so.2(+0xf1b4)[0x2af1d17161b4]
/lib64/ld-linux-x86-64.so.2(+0x131ab)[0x2af1d171a1ab]
/lib64/libdl.so.2(+0x102b)[0x2af1e0ecf02b]
/lib64/ld-linux-x86-64.so.2(+0xf1b4)[0x2af1d17161b4]
/lib64/libdl.so.2(+0x162d)[0x2af1e0ecf62d]
/lib64/libdl.so.2(dlopen+0x31)[0x2af1e0ecf0c1]
/usr/lib64/openmpi/lib/libopen-pal.so.13(+0x58f34)[0x2af1e13a6f34]
/usr/lib64/openmpi/lib/libopen-pal.so.13(+0x3b891)[0x2af1e1389891]
/usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_component_find+0x78a)[0x2af1e138ae0a]
/usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_framework_components_register+0x56)[0x2af1e1394d46]
/usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_framework_register+0x196)[0x2af1e13951f6]
/usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_framework_open+0x12)[0x2af1e1395252]
/usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_framework_open+0x66)[0x2af1e13952a6]
/usr/lib64/openmpi/lib/libmpi.so.12(ompi_mpi_init+0x476)[0x2af1dfc2a2f6]
/usr/lib64/openmpi/lib/libmpi.so.12(MPI_Init+0x193)[0x2af1dfc4c4e3]
/home/enztech/rosetta_bin_linux_2016.32.58837_bundle/main/source/build/src/release/linux/3.10/64/x86/gcc/4.8/mpi/libcore.5.so(_ZN4core4init8init_mpiEiPPc+0x33)[0x2af1da00c313]
/home/enztech/rosetta_bin_linux_2016.32.58837_bundle/main/source/build/src/release/linux/3.10/64/x86/gcc/4.8/mpi/libcore.5.so(_ZN4core4init4initEiPPc+0x1a)[0x2af1da00ef1a]
./fixbb.linuxgccrelease[0x40c53e]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x2af1e0926b15]
./fixbb.linuxgccrelease[0x40d4b1]
fixbb.linuxgccrelease:8222 terminated with signal 11 at PC=2af1e6a62258 SP=7fff24ad1bf8. Backtrace:
/lib64/libnl-route-3.so.200(rtnl_tc_unregister+0x8)[0x2af1e6a62258]
/lib64/ld-linux-x86-64.so.2(+0xfa1a)[0x2af1d1716a1a]
/lib64/libc.so.6(+0x38e69)[0x2af1e093de69]
/lib64/libc.so.6(+0x38eb5)[0x2af1e093deb5]
/lib64/libinfinipath.so.4(+0x426f)[0x2af1e6eea26f]
/lib64/libpthread.so.0(+0xf100)[0x2af1e06f8100]
/lib64/libc.so.6(gsignal+0x37)[0x2af1e093a5f7]
/lib64/libc.so.6(abort+0x148)[0x2af1e093bce8]
/lib64/libc.so.6(+0x2e566)[0x2af1e0933566]
/lib64/libc.so.6(+0x2e612)[0x2af1e0933612]
/lib64/libnl-route-3.so.200(+0x21249)[0x2af1e6a62249]
/lib64/ld-linux-x86-64.so.2(+0xf3a3)[0x2af1d17163a3]
/lib64/ld-linux-x86-64.so.2(+0x13ab6)[0x2af1d171aab6]
/lib64/ld-linux-x86-64.so.2(+0xf1b4)[0x2af1d17161b4]
/lib64/ld-linux-x86-64.so.2(+0x131ab)[0x2af1d171a1ab]
/lib64/libdl.so.2(+0x102b)[0x2af1e0ecf02b]
/lib64/ld-linux-x86-64.so.2(+0xf1b4)[0x2af1d17161b4]
/lib64/libdl.so.2(+0x162d)[0x2af1e0ecf62d]
/lib64/libdl.so.2(dlopen+0x31)[0x2af1e0ecf0c1]
/usr/lib64/openmpi/lib/libopen-pal.so.13(+0x58f34)[0x2af1e13a6f34]
/usr/lib64/openmpi/lib/libopen-pal.so.13(+0x3b891)[0x2af1e1389891]
/usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_component_find+0x78a)[0x2af1e138ae0a]
/usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_framework_components_register+0x56)[0x2af1e1394d46]
/usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_framework_register+0x196)[0x2af1e13951f6]
/usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_framework_open+0x12)[0x2af1e1395252]
/usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_framework_open+0x66)[0x2af1e13952a6]
/usr/lib64/openmpi/lib/libmpi.so.12(ompi_mpi_init+0x476)[0x2af1dfc2a2f6]
/usr/lib64/openmpi/lib/libmpi.so.12(MPI_Init+0x193)[0x2af1dfc4c4e3]
/home/enztech/rosetta_bin_linux_2016.32.58837_bundle/main/source/build/src/release/linux/3.10/64/x86/gcc/4.8/mpi/libcore.5.so(_ZN4core4init8init_mpiEiPPc+0x33)[0x2af1da00c313]
/home/enztech/rosetta_bin_linux_2016.32.58837_bundle/main/source/build/src/release/linux/3.10/64/x86/gcc/4.8/mpi/libcore.5.so(_ZN4core4init4initEiPPc+0x1a)[0x2af1da00ef1a]
./fixbb.linuxgccrelease[0x40c53e]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x2af1e0926b15]
./fixbb.linuxgccrelease[0x40d4b1]
this happens whether i use the mpi version or just the normal version.
could someone help me out with this? thank you very much
-
October 5, 2016 at 4:30 pm #11917Anonymous
“this happens whether i use the mpi version or just the normal version.”
Normal version = fixbb.default.linuxgccrelease, or just fixbb.linuxgccrelease? If it’s the latter it’s probably symlinked to the mpi version. If it’s the former it’s very odd you’re getting an mpi error without mpi compiling. The one with no intervening .default. or .mpi. is symlinked to whatever was last built. You probably know that but I wrote version 1 of this comment without checking the question’s author.
Broadly speaking, I have no idea what the error is – it looks like it’s failing in an MPI library during Rosetta initialization, so all I can suggest is to try a different version of openmpi and/or ensure that all the mpirun binaries, mpi build libraries, compilers, etc are all cross-compatible. Do any other MPI tools work on the machine? Could it be some sort of permissions issue with the mpi communication channels?
-
October 6, 2016 at 1:46 am #11920Anonymous
thanks for your reply. i got the wording confused, I meant the static mpi version not the normal version.
i read another question on the forum complaining of what it looks like to me a similar, but not the same, library related errors when running unit test with mpi. you mentioned that it could be the version of openmpi or the compiler is more recent than what was tested with rosetta. could that also be the case here? you said
GCC: 4.8.3, Open MPI: 1.6.4 was what was tested. i checked the version on my computer and it was gcc 4.8.5 and openmpi 3.0.2
thanks steven!
-
October 6, 2016 at 1:46 am #12441Anonymous
thanks for your reply. i got the wording confused, I meant the static mpi version not the normal version.
i read another question on the forum complaining of what it looks like to me a similar, but not the same, library related errors when running unit test with mpi. you mentioned that it could be the version of openmpi or the compiler is more recent than what was tested with rosetta. could that also be the case here? you said
GCC: 4.8.3, Open MPI: 1.6.4 was what was tested. i checked the version on my computer and it was gcc 4.8.5 and openmpi 3.0.2
thanks steven!
-
October 6, 2016 at 1:46 am #12962Anonymous
thanks for your reply. i got the wording confused, I meant the static mpi version not the normal version.
i read another question on the forum complaining of what it looks like to me a similar, but not the same, library related errors when running unit test with mpi. you mentioned that it could be the version of openmpi or the compiler is more recent than what was tested with rosetta. could that also be the case here? you said
GCC: 4.8.3, Open MPI: 1.6.4 was what was tested. i checked the version on my computer and it was gcc 4.8.5 and openmpi 3.0.2
thanks steven!
-
October 6, 2016 at 3:53 pm #11923Anonymous
3.7 was before the Cxx11 changeover (we switched just after 3.7) so knowing what I have now is less useful than it might be. I am using gcc 5.4.0 and openmpi 1.10.2. The openmpi web site (https://www.open-mpi.org/) does not suggest to me that their versions yet go as high as 3 (I’m guessing it’s a package renumbering from the linux distro’s package manager).
Googling around shows similar-looking errors due to ???? somewhere in MPI (https://www.mail-archive.com/devel@lists.open-mpi.org/msg18181.html) – not that I know what to do with that data. The most interesting thing from that email thread is
The main change appears to be a switch from a MOFED-based install to the
OFED packaged with RHEL7.That suggests to me that the package you are getting from Red Hat is bad; maybe try (shudder, this NEVER works) building mpi yourself? (We just had a long thread with someone who’d built it themselves and the solution was “use the package instead”, I think….) Maybe Red Hat can give you a different version of the openmpi package?
-
October 6, 2016 at 3:53 pm #12444Anonymous
3.7 was before the Cxx11 changeover (we switched just after 3.7) so knowing what I have now is less useful than it might be. I am using gcc 5.4.0 and openmpi 1.10.2. The openmpi web site (https://www.open-mpi.org/) does not suggest to me that their versions yet go as high as 3 (I’m guessing it’s a package renumbering from the linux distro’s package manager).
Googling around shows similar-looking errors due to ???? somewhere in MPI (https://www.mail-archive.com/devel@lists.open-mpi.org/msg18181.html) – not that I know what to do with that data. The most interesting thing from that email thread is
The main change appears to be a switch from a MOFED-based install to the
OFED packaged with RHEL7.That suggests to me that the package you are getting from Red Hat is bad; maybe try (shudder, this NEVER works) building mpi yourself? (We just had a long thread with someone who’d built it themselves and the solution was “use the package instead”, I think….) Maybe Red Hat can give you a different version of the openmpi package?
-
October 6, 2016 at 3:53 pm #12965Anonymous
3.7 was before the Cxx11 changeover (we switched just after 3.7) so knowing what I have now is less useful than it might be. I am using gcc 5.4.0 and openmpi 1.10.2. The openmpi web site (https://www.open-mpi.org/) does not suggest to me that their versions yet go as high as 3 (I’m guessing it’s a package renumbering from the linux distro’s package manager).
Googling around shows similar-looking errors due to ???? somewhere in MPI (https://www.mail-archive.com/devel@lists.open-mpi.org/msg18181.html) – not that I know what to do with that data. The most interesting thing from that email thread is
The main change appears to be a switch from a MOFED-based install to the
OFED packaged with RHEL7.That suggests to me that the package you are getting from Red Hat is bad; maybe try (shudder, this NEVER works) building mpi yourself? (We just had a long thread with someone who’d built it themselves and the solution was “use the package instead”, I think….) Maybe Red Hat can give you a different version of the openmpi package?
-
-
AuthorPosts
- You must be logged in to reply to this topic.