Member Site › Forums › Rosetta 3 › Rosetta 3 – General › Multiprocessing and running jobs on server using slurm
- This topic has 1 reply, 2 voices, and was last updated 6 years, 9 months ago by Anonymous.
-
AuthorPosts
-
-
April 9, 2018 at 5:59 pm #2888Anonymous
I have a question regarding parallelizing some of the bash scripts from the refinement and denovo prediction tutorials. They usually rely on GNU parallel to run across many cores, substituting some variable in the script with a number from the GNU parallel run command . I wonder how I can run the same scripts on a server running slurm workload manager, or any other possible multiprocessing strategy using Python.
I also wonder if I can carry out all these refinement / de novo prediction using Pyrosetta, to get more command over processing, and job handling and automation.
Regards.
-
April 10, 2018 at 11:00 pm #14184Anonymous
Our local cluster admins advocate using SLURM array jobs to parallelize multiple otherwise-serial runs. This works somewhat similar to GNU parallel, in that you can launch multiple SLURM allocations, each identified by a different value passed to an environment variable. You can then use that environment variable to launch a different job.
That’s easy enough to do if you can vary your command line based on a single number, but can get more difficult if there’s more complex commands. What I’ve done in that case is set up an input file (like with parallel), read from the designated line(s) in the input file, and launch the job based on that configuration file.
The one tricky thing is that, unlike parallel, you typically don’t want SLURM jobs to be too short. Due to overhead in the allocation process, our cluster admins don’t like SLURM jobs shorter than about 30 min, and prefer an hour or longer. If your individual commands are shorter than that, you may need to group those together, which can lead to more difficult situations. Actually, SLURM allocations and parallel can work together. In those sorts of cases you can launch a SLURM array job to get N processors, then grab the appropriate 1/Nth of a command file, and then feed that to parallel with the -j1 option to run on just a single processor. (Rosetta actually comes with a primitive version of parallel at main/source/scripts/python/public/parallel.py, if your cluster doesn’t have one installed.)
If you can swing a multi-processor allocation on SLURM, the approach outlined at https://rcc.uchicago.edu/docs/running-jobs/srun-parallel/index.html might actually be easier. This uses GNU parallel across a multi-processor SLURM allocation.
-
-
AuthorPosts
- You must be logged in to reply to this topic.