Interpreting Clustering Results

This topic has 2 replies, 3 voices, and was last updated 11 years, 1 month ago by Anonymous.

Viewing 2 reply threads

Author

Posts
- June 9, 2013 at 6:41 am #1606
  Anonymous
  Hi all,
  
  I did ab initio modeling with rosetta and then performed clustering. Initially I generated 500 decoys and then I did 1000 with a difference sequence. When I performed clustering, I still get 4 clusters yet I have not specified the number of clusters in my flags file.My question is whether this is a correct output?
  
  Secondly, I don’t understand the numbering of the pdb files. When I read the log file I see that it uses c.i.j but I don’t know which is the correct mapping. This is because of the different numbering in the log file and especially since I am sorting by energy levels.
  
  Also, do I take the model with the lowest energy level in the largest cluster as the most correct model?
  
  I will appreciate some answers. Thanks.
  
  PW.
- June 10, 2013 at 3:12 pm #8880
  Anonymous
  Hi. First, 500 or 1000 decoys is too low for ab initio. You will want 20,000 + models for sampling. More for longer sequences. If you don’t have access to a cluster, I would go with the Robetta server instead.
  
  For clustering, you may want to use the calibur program: http://sourceforge.net/projects/calibur/
  
  As for the correct model, it depends on your system and if you have any experimental data. Generally, you are correct in your thoughts. You can also take the ‘center’ or representative cluster member. This is output by calibur. It usually is one of the lowest energy models.
  
  The residue numbering in the output PDB files should go 1 -> n.
  
  -J
- June 10, 2013 at 6:03 pm #8883
  Anonymous
  I’m not entirely familiar with the Rosetta++ clustering application, but I believe the number of cluster settings is a maximum number of clusters, rather than an absolute number. That is, if all the structures are within the given cluster radius of a small number of cluster centers, the clustering algorithm in use won’t break them up just to have a given number of clusters. The range of structures in your small sample size may be limited enough such that 4 clusters will cover all of them.
  
  The c.i.j notation means that the structure is structure number “j” of cluster number “i”. Under common settings these are typically sorted by energy. (So “1” is the lowest energy.) The same notation should be used in the log file and the output PDBs. What are you getting such that you’re unsure of how to match things up? (Could you provide examples of the two different notations?)
  
  The lowest energy structure in the largest cluster is typically a good place to start for the model that’s most likely to be correct. However, if you have experimental evidence or chemical intuition (you look at the structure and it doesn’t look right), you may want to pick another structure or another cluster.
Author

Posts

Viewing 2 reply threads

You must be logged in to reply to this topic.