Divide larger low-res. global run into several smaller runs?

This topic has 1 reply, 2 voices, and was last updated 8 years, 10 months ago by Anonymous.

Viewing 1 reply thread

Author

Posts
- December 22, 2015 at 1:41 am #2354
  Anonymous
  Can one break one large global low-resolution docking run into smaller runs using -run:constant_seed and -run:jran=######## and just assign different ####### seed to each run.
  
  Specifically, if I would like to generate 30,000 low-res. decoys. Rather than do it as one docking run, and since I assume all decoys are based on random generator, why not break up the run into three seperate runs of 10,000 (three seperate processors) running simultaneously, each assigned a different seed? Would this be equivalent to single 30,000 run to generate 30,000 decoys? In fact, since I have access to over 1000 single processors, why not do 1000 runs specifying 30 decoy structures–guess I would have to come up with 1000 seeds. Can the seeds be larger than 7 digits. Probablyl better to spread the differences in seed size as much as possible, I assume.
  
  Obviously, the time savings would help. But not sure if there are differences one should be aware of if choosing such a route. Or maybe I’m completely missing something.
  
  Thanks in Advance
  
  J. Snyder
- December 22, 2015 at 4:31 pm #11387
  Anonymous
  This is exactly how we use Rosetta: many processors on different RNG seeds. So, yes, this absolutely works.
  
  I don’t know what the seed size limits are, but I can tell you seed neighbors are irrelevant. The map of “seed space” to “RNG behavior space” is, well, random – that’s the point – so I always used seed, seed+1, seed+2, etc.
  
  Caveats:
  
  1) Rosetta does not yet have a shared-memory model. If one copy of Rosetta running uses 1 GB, two use 2, even if a lot of that memory is the same constant database data loaded twice. You’ll eventually hit diminising returns. There is also some small overhead per-process that will make you lose a little bit of time.
  
  2) To run on 30 processors, you need to either run in 30 different directories and merge the results yourself (you’ll have dir1/result_0001.pdb, dir2/result_0001.pdb, etc), or use MPI. The only purpose of the MPI job distributors in most circumstances is to let you run many processors writing to one directory.
  
  3) Just a warning on the number 30,000: file systems start to become upset with that many files in one directory. Splitting your directories, or using the “silent file” (https://www.rosettacommons.org/docs/latest/rosetta_basics/file_types/silent-file) system, will help.
  
  4) sysadmins generally don’t like jobs that schedule 1000 processors for an hour, instead of 100 processors for 10 hours each – look into what your scheduler / sysadmin wants.
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.