If you set the number of iterations to greater than 20, the ddg_monomer application averages only the lowest 20 scores. (There’s no way of changing this number, short of editing the code and recompiling.)
The best I can tell, multiple mutations shouldn’t be affected by the order. That said, remember that the ddG protocol, like most Rosetta protocols, is stochastic, so there being a small difference between a two runs that should be the same is not too surprising. I’d only be concerned if the difference was systematic across multiple runs. (And that’s multiple runs with different random number seeds – don’t use the constant_seed flag.)