Divide larger low-res. global run into several smaller runs?

Member Site Forums Rosetta 3 Rosetta 3 – General Divide larger low-res. global run into several smaller runs?

Viewing 1 reply thread
  • Author
    Posts
    • #2354
      Anonymous

        Can one break one large global low-resolution docking run into smaller runs using -run:constant_seed and -run:jran=######## and just assign different ####### seed to each run.

        Specifically,  if I would like to generate 30,000 low-res. decoys.  Rather than do it as one docking run, and since I assume all decoys are based on random generator, why not break up the run into three seperate runs of 10,000 (three seperate processors) running simultaneously, each assigned a different seed?  Would this be equivalent to single 30,000 run to generate 30,000 decoys?   In fact, since I have access to over 1000 single processors, why not do 1000 runs specifying 30 decoy structures–guess I would have to come up with 1000 seeds.  Can the seeds be larger than 7 digits.  Probablyl better to spread the differences in seed size as much as possible, I assume.  

        Obviously, the time savings would help.  But not sure if there are differences one should be aware of if choosing such a route.  Or maybe I’m completely missing something.

        Thanks in Advance

        J. Snyder

      • #11387
        Anonymous

          This is exactly how we use Rosetta: many processors on different RNG seeds.  So, yes, this absolutely works.

          I don’t know what the seed size limits are, but I can tell you seed neighbors are irrelevant.  The map of “seed space” to “RNG behavior space” is, well, random – that’s the point – so I always used seed, seed+1, seed+2, etc.  

          Caveats:

          1) Rosetta does not yet have a shared-memory model.  If one copy of Rosetta running uses 1 GB, two use 2, even if a lot of that memory is the same constant database data loaded twice.  You’ll eventually hit diminising returns.  There is also some small overhead per-process that will make you lose a little bit of time.

          2) To run on 30 processors, you need to either run in 30 different directories and merge the results yourself (you’ll have dir1/result_0001.pdb, dir2/result_0001.pdb, etc), or use MPI.  The only purpose of the MPI job distributors in most circumstances is to let you run many processors writing to one directory.

          3) Just a warning on the number 30,000: file systems start to become upset with that many files in one directory.  Splitting your directories, or using the “silent file” (https://www.rosettacommons.org/docs/latest/rosetta_basics/file_types/silent-file) system, will help.

          4) sysadmins generally don’t like jobs that schedule 1000 processors for an hour, instead of 100 processors for 10 hours each –  look into what your scheduler / sysadmin wants.

      Viewing 1 reply thread
      • You must be logged in to reply to this topic.