running in MPI mode and multiple scores per output PDB file?

Member Site Forums Rosetta 3 Rosetta 3 – Applications running in MPI mode and multiple scores per output PDB file?

Viewing 3 reply threads
  • Author
    Posts
    • #3294
      Anonymous

        Hi Forum 

        I recently did a Rosetta fixbb run with MPI and found that the score file had a lot more lines of output than there were actual PDB files. Specifically, I’ve got 353 scores in score.sc but only 12 PDB files.  is it possible that the parallel processors are simply overwriting the PDBs?   Is there a flag I should be including to avoid this? 

        Thanks!  

      • #15030
        Anonymous

          353/12 is not a whole number, but otherwise with is 100% the symptom of “you didn’t actually run in MPI”.  This is what happens if you run non-MPI-compiled Rosetta (with or without mpiexec).  I assume you used -nstruct 12.

          Does your rosetta binary have`mpi` in its name?  it should be rosetta-app-name.mpi.(system)(compiler)(mode)

          • #15032
            Anonymous

              Is it possible that, even though the binary has ‘mpi’ in the name, that perhaps it wasn’t compiled correctly?   Is there a unit test or something for MPI-compiled Rosetta?  

            • #15044
              Anonymous

                No particularly useful tests I know of. Rocco’s comment about the tracer tags with MPI rank below might be diagnostic.  Just the log files themselves should say something to; I haven’t done a run in a while but proably the job distributor choice is announced and you’ll see it in a log line near the top.

                 

              • #15033
                Anonymous

                  Yup, the binary does have mpi in the name: 


                  mpiexec $HOME/rosetta_src_2019.22.60749_bundle/main/source/bin/fixbb.mpi.linuxgccrelease -s filename.pdb -ex1 -ex2 -resfile resfile.txt -nstruct 15 -overwrite -linmem_ig 10

                   

                  the numbers probably don’t work out just right because I hit the walltime on the job and the machine killed the job before it was finished. 

              • #15031
                Anonymous

                  (comment removed and resubmitted as direct reply to previous poster) 

                • #15035
                  Anonymous

                    I’m wondering if it might be an MPI version mismatch. That is, if you compile with OpenMPI libraries, say, but your mpiexec for MPICH2 version, say, then the MPICH2 launcher won’t necessarily set things up properly for OpenMPI, and you might end up having each process think it’s running serial, despite being under an MI launcher.

                    Double check your compilation settings and where your mpiexec is coming from (e.g. `which mpiexec`). Sometimes with clusters you get a mixed environment where mpiexec goes to MPICH2 (for example), but mpirun goes to OpenMPI (or vice versa, etc.).

                    The other thing to take a look at is the tracer output. If MPI is properly set up, there should be an annotation about the MPI process in parenthesis for each line. If that’s missing, or if it’s all ‘(0)’,  (with no other numbers, despite launching multiple processes in MPI) then it could be that the MPI environment is not set up correctly for Rosetta to realize it’s running under MPI, and may be running serially. There may be other information in the tracer about how thing are running under MPI as well.

                    • #15049
                      Anonymous

                        Yes!!!  That seems to have been the problem!  The version  of Open MPI on the head node was different from that on the compute node.  All fixed now! 

                        Thank you all for your help !! 

                  Viewing 3 reply threads
                  • You must be logged in to reply to this topic.