    Rosetta scoring is only truly effective for ranking a series of models of the same length (but not necessarily the same sequence). In other words, it is best for comparing models produced within a run by multiple nstruct.

    It is never safe to compare raw scores between structures of different lengths. The score magnitude tends to increase with size. So, a pose of 100 residues and a pose of 200 residues might have scores of -236 and -452, respectively. You can normalize this by number of residues to iron out some of the problems.

    It is sort-of-okay to compare scores of models, even if they are different lengths and different folds, so long as they have had the same freedoms applied. In other words, if both have been fully relaxed and minimized, then comparison of their scores (normalized by length) will carry SOME information. Be wary.