Member Site › Forums › Rosetta 3 › Rosetta 3 – Build/Install › Rosetta3.2.1-LAM-MPI Run-Problem-More than 2 processor Jobs
- This topic has 10 replies, 3 voices, and was last updated 13 years ago by Anonymous.
-
AuthorPosts
-
-
May 24, 2011 at 3:24 pm #919Anonymous
Hi All:
I am a new Rosetta user. I had just finished compiling parallel Rosetta (rosetta3.2.1/gcc-4.4.3/LAM-MPI-7.1.4)
and can run 2 processor jobs with no issues. But jobs fail for more than two processors (error message below).
Any help would be greatly appreciated.Thanks
Ravi
Linux System:
Linux node2n29 2.6.32-29-server #58-Ubuntu SMP Fri Feb 11 21:06:51 UTC 2011 x86_64 GNU/LinuxParallel version of Rosetta was compiled using GCC/LAM-MPI-7.1.4
Run Command (memory used was 8 GB)
bin/AbinitioRelax.mpi.linuxgccrelease @flags—-flags
-in:file:native inputs/1l2y.pdb
-in:file:frag3 inputs/aa1l2yA03_05.200_v1_3
-in:file:frag9 inputs/aa1l2yA09_05.200_v1_3
-out:nstruct 1
-out:file:silent 1l2y_silent.out
-no_prof_info_in_silentout
-mute core.io.database
-run:constant_seed
-run:jran 1111111
-database /opt/nasapps/build/Rosetta/rosetta_databaseError Message:
……………………………..
Stage 2
Folding with score1 for 2000
One of the processes started by mpirun has exited with a nonzero exit
code. This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a “return
0″ or “exit(0)” in your C code before exiting the application.PID 11933 failed on node n0 (129.43.63.71) due to signal 11.
-
May 24, 2011 at 3:39 pm #5626Anonymous
A) The Abinitio executeable is non-obviously not MPI compatible. It shouldn’t crash, but it won’t actually work in MPI; it just runs concurrent non-MPI jobs (that overwrite each others’ output).
I suspect it is crashing due to the filesystem getting angry at files overwriting each other, it isn’t giving me a Rosetta error message to work with.
There is an abinitio MPI patch for 3.2 floating around. Would you like me to email it to the address you gave when you signed up for the message boards?
-
May 24, 2011 at 4:44 pm #5629Anonymous
Hi Steven:
I have applied the patch that you had sent me
PATCH
[ravi@torkv rosetta_source]$ ./scons.py bin mode=release extras=mpi
scons: Reading SConscript files …
svn: ‘.’ is not a working copy
scons: done reading SConscript files.
scons: Building targets …
mpiCC -o build/src/release/linux/2.6/64/x86/gcc/mpi/apps/public/AbInitio_MPI.o -c -std=c++98 -pipe -ffor-scope -W -Wall -pedantic -Wno-long-long -O3 -ffast-math -funroll-loops -finline-functions -finline-limit=20000 -s -Wno-unused-variable -DNDEBUG -DUSEMPI -Isrc -Iexternal/include -Isrc/platform/linux/64/gcc -Isrc/platform/linux/64 -Isrc/platform/linux -Iexternal/boost_1_38_0 -I/usr/local/include -I/usr/include src/apps/public/AbInitio_MPI.cc
mpiCC -o build/src/release/linux/2.6/64/x86/gcc/mpi/AbInitio_MPI.linuxgccrelease -Wl,-rpath=/opt/nasapps/build/Rosetta/rosetta_source/build/src/release/linux/2.6/64/x86/gcc/mpi build/src/release/linux/2.6/64/x86/gcc/mpi/apps/public/AbInitio_MPI.o -Llib -Lexternal/lib -Lbuild/src/release/linux/2.6/64/x86/gcc/mpi -Lsrc -L/usr/local/lib -L/usr/lib -L/lib -L/lib64 -ldevel -lprotocols -lcore -lnumeric -lutility -lObjexxFCL -lz
Install file: “build/src/release/linux/2.6/64/x86/gcc/mpi/AbInitio_MPI.linuxgccrelease” as “bin/AbInitio_MPI.linuxgccrelease”
mpiCC -o build/src/release/linux/2.6/64/x86/gcc/mpi/AbInitio_MPI.mpi.linuxgccrelease -Wl,-rpath=/opt/nasapps/build/Rosetta/rosetta_source/build/src/release/linux/2.6/64/x86/gcc/mpi build/src/release/linux/2.6/64/x86/gcc/mpi/apps/public/AbInitio_MPI.o -Llib -Lexternal/lib -Lbuild/src/release/linux/2.6/64/x86/gcc/mpi -Lsrc -L/usr/local/lib -L/usr/lib -L/lib -L/lib64 -ldevel -lprotocols -lcore -lnumeric -lutility -lObjexxFCL -lz
Install file: “build/src/release/linux/2.6/64/x86/gcc/mpi/AbInitio_MPI.mpi.linuxgccrelease” as “bin/AbInitio_MPI.mpi.linuxgccrelease”
scons: done building targets.
MPIRUN with NP 2 WORKS FINE
mpirun -np 2 $rosetta_home/bin/AbinitioRelax.mpi.linuxgccrelease @flags……
……
……
Total weighted score: 24.862===================================================================
Finished Abinitioprotocols.abinitio.AbrelaxApplication: (1) Finished _0001 in 7 seconds.
protocols::checkpoint: (1) Deleting checkpoints of ClassicAbinitio
protocols::checkpoint: (1) Deleting checkpoints of Abrelax
protocols.jobdist.JobDistributors: (1) Node: 1 next_job()
protocols.jobdist.JobDistributors: (1) Slave Node 1 — requesting job from master node; tag_ 1
protocols.jobdist.JobDistributors: (0) Master Node –available job? 0
protocols.jobdist.JobDistributors: (0) Master Node — Spinning down node 1
protocols.jobdist.JobDistributors: (0) Node 0 — ready to call mpi finalize
protocols.jobdist.JobDistributors: (1) Node 1 — ready to call mpi finalize
protocols::checkpoint: (0) Deleting checkpoints of ClassicAbinitio
protocols::checkpoint: (0) Deleting checkpoints of Abrelax
protocols::checkpoint: (1) Deleting checkpoints of ClassicAbinitio
protocols::checkpoint: (1) Deleting checkpoints of Abrelax
–MPIRUN with NP >2 FAILS
===================================================================
Stage 2
Folding with score1 for 2000
One of the processes started by mpirun has exited with a nonzero exit
code. This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a “return
0″ or “exit(0)” in your C code before exiting the application.PID 7798 failed on node n0 (129.43.63.50) due to signal 11.
Could LAM-7.1.4 be an issue?
Thanks
Ravi
-
May 24, 2011 at 4:49 pm #5630Anonymous
Steven:
I forgot to mention that I am using Python version, python-2.7
Thanks
-
May 24, 2011 at 3:45 pm #5627Anonymous
Thanks for your reply. Yes, please email me the fix. Thanks.
-
May 24, 2011 at 3:47 pm #5628Anonymous
It’s on the way.
-
December 9, 2011 at 5:48 am #6388Anonymous
Hi Ravi, I am using 3.3 version and also looking forward to run Abinitiorelax function in mpi. Can you please forward me the patch on and tell me how I can do the run with that?
Thanks
-
May 25, 2011 at 8:28 pm #5648Anonymous
I hate taking this off the boards, but you and another user are reporting similar problems, I’ve emailed you both to try to figure out if there’s a shared root cause.
-
December 9, 2011 at 4:07 pm #6390Anonymous
on the way
-
December 14, 2011 at 4:43 am #6410Anonymous
Hi Lewis, I have put the Abintion_mpi.cc file in the src/apps/public directory. I have opened the src/apps.src.settings file……
sources = {
“” : [],“curated”: [],
“benchmark”: [ “benchmark” ],
“benchmark/scientific”: [
“design_contrast_and_statistic”,
“ddg_benchmark”,
“rotamer_recovery”,
],
“public/bundle” : [ “minirosetta”, “minirosetta_graphics” ],
“public/ligand_docking” : [
“ligand_rpkmin”,
“ligand_dock”,
“extract_atomtree_diffs”,
],
“public/docking” : [
“docking_protocol”,
“docking_prepack_protocol”,
],
“public/flexpep_docking” : [ “FlexPepDocking” ], # /* Barak,doc/apps/public/flexpep_docking/barak/FlexPepDocking.dox, test/integration/tests/flexpepdock/ */
“public/enzdes” : [
“enzyme_design”,
“CstfileToTheozymePDB”
],
“public/rosettaDNA” : [ “rosettaDNA” ],
“public/design” : [“fixbb”],
“public/loop_modeling” : [ “loopmodel” ],
“public/match” : [
“match”,
“gen_lig_grids”,
“gen_apo_grids”
],
“public/membrane_abinitio” : [ “membrane_abinitio2” ],“public/comparative_modeling” : [
“score_aln”,
“super_aln”,
“full_length_model”,
“cluster_alns”,
],“public/electron_density” : [
“mr_protocols”,
“loops_from_density”,
],“public” : [
“score_jd2”,
“relax”,
“idealize”,
“idealize_jd2”,
“cluster”,
“combine_silent”,
“extract_pdbs”,
“AbinitioRelax”,
“AbInitio_MPI”,
“backrub”,
“sequence_tolerance”,
“SymDock”
],
“public/rosetta_scripts” : [
“rosetta_scripts”,
“revert_design_to_native”
],
“public/scenarios” : [
“FloppyTail”, # /* Steven Lewis, doc/apps/public/scenarios/FloppyTail.dox, test/integration/tests/FloppyTail/ */
# “FloppyTailACAT”, # /* Barak Raveh */
“ca_to_allatom”, # /* Frank DiMaio, doc/apps/public/scenarios/ca_to_allatom.dox */
],
}
include_path = [ ]
library_path = [ ]
libraries = [ ]
subprojects = [ “devel”, “protocols”, “core”, “numeric”, “utility”, “ObjexxFCL”, “z” ]Now just wandering where to add a line as been mentioned to you for “Abinitio_mpi” compilation.
Does i have to need any extra flag after for building this along with other programs of the rosetta_source directory??
-
December 14, 2011 at 2:59 pm #6414Anonymous
You already did add it to the “public:” group.
Editing this file tells SCons to compile it, so you don’t need a flag, just recompile.
-
-
AuthorPosts
- You must be logged in to reply to this topic.