chooising an appropriate cluster — parallel granularity

This topic has 1 reply, 2 voices, and was last updated 13 years, 6 months ago by Anonymous.

Viewing 1 reply thread

Author

Posts
- June 19, 2012 at 4:57 pm #1316
  Anonymous
  Hello,
  
  I am interested in any advice on choosing an appropriate cluster on which to run Rosetta 3.4. Most likely, I’ll be using it for relaxation and high-resolution, all-atom refinement. I am just wondering about the parallel granularity — the ratio of communication to computation (e.g. coarse vs fine grained). Is it better to go with more of a throughput cluster or a more tightly coupled, low latency cluster?
  
  Here are my main options (pertaining to the SHARCNET systems):
  Kraken: “Throughput clusters, an amalgamation of older point-of-presence and throughput clusters, suitable for serial applications, and small-scale low latency demanding parallel MPI applications.”
  Orca: “Low latency parallel applications.”
  Requin: “Large, tightly-coupled MPI jobs.”
  Saw: “Parallel applications.”
  
  More info is here: https://www.sharcnet.ca/my/systems/clustermap
  
  Any advice would be much appreciated. So far, I have been using the kraken throughput cluster, but I’d like some confirmation that that’s a suitable choice.
  
  Thanks!
  
  Rob
- June 19, 2012 at 6:35 pm #7274
  Anonymous
  For most applications, Rosetta does zero or nearly zero communication. Most applications are parallelized ONLY at the independent-trajectory level, and so it is “embarassingly parallel” and the speed (in terms of structures per unit time) is linear (and slope of 1) with the number of processors. As a corollary, you cannot accelerate a single trajectory.
  
  These applications communicate a little or not at all to begin trajectories (deciding which processors do which jobs). This is extremely insensitive to the cluster architecture; poor communication won’t hurt at all. When using silent file output, there is an amount of communication at the end of a job, as the job’s results are emitted to a single node responsible for disk I/O; this is a bolus rather than constant communication. I think this is only weakly sensitive to communication.
  
  Certain applications behave differently, but none are communication-bound; all are strongly computation-bound.
  
  BTW, your URL requires a login, so we can’t see what you wanted us to see.
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.