Viewing 1 reply thread
  • Author
    • #301

        We have recently installed Rosetta compiled with “make mpilam” that uses a 32-bit gcc based MPICH 1 library and we are having a few troubles. The batch job is submitted via qsub and rosetta.mpilam is executed with the PBS script using mpiexec

        1. Core usage: Not all of the processing cores appear to be efficiently used. We have allocated 8 cores per node (#PBS -l nodes=1:ppn=8), but only 2 to 3 processing cores are being used at any given time.

        2. Duplicate structures: Within a given instance of rosetta, multiple structures are generated with the same name (S_0002_1511 for example has 2 copies in the aaubq1.out file). This results in 20 to 30 structures in aaubq1.out when only 10 were requested (very small test run). All of the structures with duplicate names appear to be identical. I would expect that the duplicate structures could be solved by adding -seed_offset 1000 , but how do we get rosetta to assign a different model number to each core (S_0001_xxxx to core 0, S_0002_xxxx to core 1, etc.).



      • #4004

          I’m not familiar with mpi, but here is the suggestions from previous user discussion:

          There is MPI support in Rosetta, though this has not been tested with Condor. MPI is included for PBS-style supercomputers which allocate a large number of nodes at once. For a Condor cluster, MPI is not needed to run Rosetta in parallel – due to the “embarrassingly parallel” nature of the jobs, there would be no advantage. Instead, simply submit all the jobs as single independent jobs on Condor. Rosetta determines the names of expected output files, and each job computes data to fill in files that have not yet been created.

          By default, Rosetta takes a random number from the clock and optionally applies an offset specified on the command-line. If you group your jobs into a single condor script with “Queue 20”, for example, you can add:

          -seed_offset $(Process)

          to the Rosetta command-line to make the random numbers depend on the cpu number within the job (in this case, 1-20).

          If you want even more control, you can use:

          -constant_seed -jran

          on the Rosetta command line to specific the seed for each job.

          I think you used constant_seed in your command lines and it causes the identical outputs

      Viewing 1 reply thread
      • You must be logged in to reply to this topic.