Generally, things in Rosetta are set up such that more negative scores are better/more stable. Exceptions to that rule are going to be protocol specific, though. One protocol for calculating ddG might have negative-better scores, but it’s perfectly possible that another protocol is going to set things up to output positive-better scores, to match a given dataset.
For the monomer stability ddG values calculated by the ddg_monomer application (which correspond to the ones from the Kellogg et al. paper), though, the reported values are (mutant stability) – (wild type stability), so more negative means a better/more stable mutant structure.
I’m not too familiar with it, but from the documentation I found from the Protherm database (http://www.abren.net/protherm/pp_data_help.html#E.10) you’re right that the Protherm measurements are dG_unfolding(mutant) – dG_unfolding(wild), so a more positive number indicates a more stable mutant structure. My guess is that for the paper, the sign of the output values was reversed after output but before plotting to match the values from the Protherm database.