Member Site › Forums › Rosetta 3 › Rosetta 3 – Applications › cluster.mpi.linuxgccrelease failed
- This topic has 3 replies, 2 voices, and was last updated 10 years, 6 months ago by Anonymous.
-
AuthorPosts
-
-
February 24, 2014 at 8:11 pm #1838Anonymous
Hi there,
I was clustering a silent files with my 10% lowest energy decoys and the cluster.mpi.linuxgccrelease just stopped and issued the following on screen:
mpirun noticed that process rank 4 with PID 19956 on node compute-1-5 exited on signal 9 (Killed).
Well, it seems some problem with MPIRUN rather than with the cluster.mpi.linuxgccrelease binary.
I’m running cluster.mpi.linuxgccrelease with the following command line:
mpirun -x LD_LIBRARY_PATH=$LIB –mca btl_tcp_if_include eth0 -np 20 –host compute-1-11,compute-1-12,compute-1-13,compute-1-14,compute-1-15,compute-1-16,compute-1-17,compute-1-18,compute-1-19,compute-1-20 $BIN/cluster.mpi.linuxgccrelease -in:file:fullatom -in:file:silent_struct_type binary -in:file:silent ecut_10.out -cluster:radius -1Did I miss some special MPIRUN option?
Thanks in advance. -
February 24, 2014 at 8:27 pm #9836Anonymous
The clustering code was never multi-processor-ized to my knowledge. I don’t think it should actually fail in MPI, but it certainly won’t work better than the non-MPI.
-
February 25, 2014 at 7:04 pm #9837Anonymous
Hi smlewis,
Thanks for your replay. Judging by the output on screen, the mpi version seems to work reasonable well, but it doesn’t writes the expected clusters before die. So, I thought I had missed some MPIRUN option. Well, if you don’t use the mpicluster, who am I to use it? Thanks for sharing.
Best.EDIT: the information bellow might be useful to another user and/or author.
Feb 25 14:59:59 compute-1-20 kernel: Out of memory: Kill process 25129 (cluster.mpi.lin) score 445 or sacrifice child
Feb 25 14:59:59 compute-1-20 kernel: Killed process 25129, UID 1006, (cluster.mpi.lin) total-vm:7986656kB, anon-rss:7777940kB, file-rss:2620kB
For some reason the process has been killed with the status “Out of memory”. The same jobs was completed with the non-mpi version of cluster.default.linuxgccrelease. -
March 27, 2014 at 6:29 pm #9929Anonymous
This problem has been solved decreasing the number of process per worknode.
See https://www.rosettacommons.org/node/3619
Hope it helps.
-
-
AuthorPosts
- You must be logged in to reply to this topic.