I’ve been trying to find the definition of the cluster application’s -gdtmm flag, and why we might use it over clustering by rmsd (no flag required).
All the examples I’ve run across use the -gdtmm flag, but again, I’m just not how exactly this is clustering.
Then, following that, when the flag -cluster:sort_groups_by_energy is used, how does it sort them? Does it label the first cluster as the one with the lowest average energy for all the structures in the cluster? It almost seems at odds with clustering by RMSD.
The GDTMM is simply a different metric for structure-structure difference. It’s a variant of the Global Distance Test metric that is used in evaluating CASP entries. The reason for using it instead of rmsd is that rmsd has very poor behavior with respect to small outliers. For example, if you get the majority of the protein very nearly correct, but completely mess up a single loop, you could have a much worse rmsd than if you’re off for the entire protein. GDTMM attempts to correct those sorts of issues.
The -cluster:sort_groups_by_energy sorts the clusters based on the energy of the lowest energy member – the cluster with the lowest energy for the lowest energy member becomes cluster zero, etc. (Okay, that’s not *quite* true. It looks like the sorting metric is actually a weighted combination of energy and cluster size. You can control how much the cluster size contributes with the -cluster:population_weight flag. It defaults to 0.09, which means the ranking is based 91% on score, and 9% on cluster size.)