Why a paralleled Rosetta perform like a normal one?

Member Site Forums Rosetta++ Rosetta++ – General Why a paralleled Rosetta perform like a normal one?

Viewing 1 reply thread
  • Author
    Posts
    • #845
      Anonymous

        Hi all,

        I newly built a paralleled Rosetta2.3.0 with OpenMPI 1.4.3. I used “make mpilam” to build it.
        After building, I tested it with the following command:

        mpirun -np 16 /home/knight/softwares/Rosetta-2.3.0/rosetta++/bin/rosetta.mpilam -mpi_task_distribution -s 1ubq.pdb -design -fixbb -resfile resfile -ex1 -pdbout test -ndruns 2000 > 1ubq_fixbb_mine.log

        -np = 16 since I have 16 cores.

        All cpus were running with 100%, however, I found this didn’t speed up the calculations. Instead, it seems that Rosetta just put the VERY same job every core. In other words, every cores does the same thing. So I am wondering if Rosetta works correctly here? Why it didn’t distribute the jobs correctly? Any suggestions? Thanks in advance!!!

        P.S. My OS is RHEL5.4-32bit-server.

        Ming Liu, Ph.D

      • #5262
        Anonymous

          First thing: I don’t know anything about MPI in Rosetta++ (and I don’t know that anyone remembers it well).

          Rosetta (both 2 and 3) does not actually parallelize anything as a general rule. Usually the MPI build allows the job distribution layer to have each core work on a different trajectory; the only advantages offered by MPI are that the results all land in one directory (instead of needing N directories for N processors) and in inflexible cluster environments that insist on MPI. Thus, if you are having trouble with MPI and aren’t required to use it by your cluster’s sysadmin, don’t use it, as it won’t speed up your results. You’ll get the same speed out of just making 16 results directories and starting 16 jobs, one in each directory. (Remember to use -constant_seed and -jran to give them different random number generator seeds).

          All cores working on the same job (all on myjob_0001, then all on myjob_0002) is indicative that the MPI communication layer is not working for some reason. If you pass an executeable to mpirun that is not actually built with MPI, this is the result you get (at least in 3.x).

          In 3.x, if you try to run the MPI-built executeable WITHOUT mpirun, it fails with an MPI-related error message:

          PMGR_COLLECTIVE ERROR: unitialized MPI task: Missing required environment variable: MPIRUN_RANK

          You could try your build without MPI to see if you get this error or not; if you do not get this error it may mean that the MPI part of the MPI build didn’t get built for some reason.

      Viewing 1 reply thread
      • You must be logged in to reply to this topic.