When using Rosetta’s non-mpi applications I had a limit to read silent files of 16GB at most, which is the ram memory of one node. Do the Rosetta’s mpi applications also share memory? I mean, would be possible to read silents files bigger than the memory of one node?
Thanks in advance.
There’s some options which may help you (both in the MPI case and the single-processor case).
-in:file:lazy_silent and -jd2:lazy_silent_file_reader — this should defer reading in structures until you need them, which should cut down on memory usage.
-jd2:delete_old_poses — This this causes an additional check to delete off old poses when they’re no longer needed, reducing memory usage.
If you split MPI runs across multiple nodes, that can possibly help with memory usage — not so much because the MPI applications share memory, but rather because each node will only load the structures it’s using (especially with the flags above), so you’ll only look at portions of the large silent file, instead of trying to read the whole thing.
I’ve experienced problems with the mpi apps. It seems there are some connection problem between nodes that issues the following message on screen:
[compute-1-17.local][[40649,1],6][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] connect() to 10.2.255.253 failed: No route to host
I don’t have the necessary privileges do fix this and will have to wait until the cluster guy come from vacation. Until then, I’ll go with 100 single processes.
Thanks for the options to deal with memory usage.