cluster.mpi.linuxgccrelease failed

This topic has 3 replies, 2 voices, and was last updated 10 years, 5 months ago by Anonymous.

Viewing 1 reply thread

Author

Posts
- February 24, 2014 at 8:11 pm #1838
  Anonymous
  Hi there,
  I was clustering a silent files with my 10% lowest energy decoys and the cluster.mpi.linuxgccrelease just stopped and issued the following on screen:
  
  mpirun noticed that process rank 4 with PID 19956 on node compute-1-5 exited on signal 9 (Killed).
  
  Well, it seems some problem with MPIRUN rather than with the cluster.mpi.linuxgccrelease binary.
  I’m running cluster.mpi.linuxgccrelease with the following command line:
  mpirun -x LD_LIBRARY_PATH=$LIB –mca btl_tcp_if_include eth0 -np 20 –host compute-1-11,compute-1-12,compute-1-13,compute-1-14,compute-1-15,compute-1-16,compute-1-17,compute-1-18,compute-1-19,compute-1-20 $BIN/cluster.mpi.linuxgccrelease -in:file:fullatom -in:file:silent_struct_type binary -in:file:silent ecut_10.out -cluster:radius -1
  
  Did I miss some special MPIRUN option?
  Thanks in advance.
- February 24, 2014 at 8:27 pm #9836
  Anonymous
  The clustering code was never multi-processor-ized to my knowledge. I don’t think it should actually fail in MPI, but it certainly won’t work better than the non-MPI.
- February 25, 2014 at 7:04 pm #9837
  Anonymous
  Hi smlewis,
  Thanks for your replay. Judging by the output on screen, the mpi version seems to work reasonable well, but it doesn’t writes the expected clusters before die. So, I thought I had missed some MPIRUN option. Well, if you don’t use the mpicluster, who am I to use it? Thanks for sharing.
  Best.
  
  EDIT: the information bellow might be useful to another user and/or author.
  Feb 25 14:59:59 compute-1-20 kernel: Out of memory: Kill process 25129 (cluster.mpi.lin) score 445 or sacrifice child
  Feb 25 14:59:59 compute-1-20 kernel: Killed process 25129, UID 1006, (cluster.mpi.lin) total-vm:7986656kB, anon-rss:7777940kB, file-rss:2620kB
  For some reason the process has been killed with the status “Out of memory”. The same jobs was completed with the non-mpi version of cluster.default.linuxgccrelease.
- March 27, 2014 at 6:29 pm #9929
  Anonymous
  This problem has been solved decreasing the number of process per worknode.
  See https://www.rosettacommons.org/node/3619
  Hope it helps.
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.