I’m not surprised that the scores are different. While the general concept behind the energy terms are the same, details have changed a bit, which likely result in score changes. It looks like you’ve already been directed to the papers we have on the score function – unfortunately they’re not as detailed as they probably should be. There’s a recent move underway to put together a review article about the Rosetta3 scorefunction, but that will likely be a while in coming, and in any case won’t cover details about Rosetta2’s implementation.
For now, at least, the best reference for what the scorefunction is doing is the code. For Rosetta3, most of the score terms are implemented under rosetta_source/src/core/scoring/. For example fa_atr and fa_rep are implemented in rosetta_source/src/core/scoring/etable/. I am not familiar enough with the Rosetta++ code to know where it is implemented, although a brief skim indicates that rosetta++/score.cc is a good place to start.
The other issue to keep in mind is that Rosetta’s scorefunction can be sensitive to small coordinate changes. Reducing the accuracy of the coordinates from the internal representation to the fixed 0.001 precision of a PDB and then reading it back again can cause differences in scoring, even with the same version of Rosetta.