Member Site › Forums › Rosetta 3 › Rosetta 3 – General › RNA Denovo RMSD data
- This topic has 31 replies, 3 voices, and was last updated 12 years, 4 months ago by Anonymous.
-
AuthorPosts
-
-
August 2, 2012 at 8:40 am #1367Anonymous
Hi,
I am new to the Rosetta software and I would really appreciate some help from someone with experience in RNA Denovo. I am using the rna_denovo application on a cluster with Rosetta 3.3 and I want to predict 3D structures for a large number of RNA molecules.
Initially I predict 3D structures for which there is NMR data available so that I can determine how well the algorithm works. I am supplying the NMR structure as the native structure in order to plot RMSD vs. energy (score).
I would like to generate similar plots when predicting structures where I am not supplying a native structure. As I understand it from the RNA Denovo Server documentation, RMSD is calculated between the models and the lowest energy model (best scoring model) when no native pdb is supplied, however, I do not obtain any RMSD data in my out file when I run without a native structure. How can I calculate this? Are there any significant differences between the RNA Denovo server application and the one included in the Rosetta package?
Also, are there any post-processing scripts available that can be useful for analyzing data from RNA Denovo?
I hope that someone could clarify these points to me. Thank you.Best regards,
Emma -
August 2, 2012 at 2:14 pm #7521Anonymous
Hi again,
I think that it is probably not a good idea to calculate RMSD to the best scoring model as this structure differs significantly from the native one, at least as far as I have observed in my test runs. Can anyone suggest what I can look for/analyze in order to be able to extract reasonable models? Can the cluster algorithm be helpful in my case?
Thanks you.
Regards,
Emma -
August 3, 2012 at 12:04 pm #7525Anonymous
Thank you for your quick response and suggestions about using clustering.
However, as far I know it is not possible to use the cluster application for RNA in Rosetta 3.1-3.3, without serious modifications. I read in an old post that it will be introduced in the 3.4 release. Do you know if it has been included yet? I don’t have access to 3.4 yet but if I know that the cluster application for RNA is in it I will make sure to update.Thanks,
Emma -
August 3, 2012 at 2:39 pm #7527Anonymous
I don’t see any evidence that cluster was significantly modified between 3.3 and 3.4 (it appears not to have been modified at all except for necessary maintenance so that it will compile).
If your poses are from RNA_denovo, then they’ll have Rosetta’s expected nomenclature. I would guess the only modification you’d need to make is to swap DNA for RNA in the default residue type set (and/or tweak cluster so that it expects RNA poses), and possibly use an option or another code hack to ensure all-atom (instead of c-alpha) clustering.
I’ve sent a note to someone who does RNA to take a look here if they can.
-
August 3, 2012 at 6:06 pm #7529Anonymous
Skimming through the code, it looks like there is an explicit check for RNA in the cluster application (even in 3.3) so that as long as the first residue is RNA, the clustering application should be set to do an all-atom RMSD clustering. I don’t know if there would be additional issues which would cause an RNA structure not to work.
I should also mention that there appears to be an rna_cluster application, as well, in release 3.4, although it doesn’t appear to be in release 3.3.
Another option is to skip using Rosetta for clustering and try a different application. Although I don’t know how well it handles RNA, a number of people in the Rosetta community have very good things to say about Calibur ( http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2881085/ – http://sourceforge.net/projects/calibur/ ).
-
August 8, 2012 at 7:33 am #7542Anonymous
How can I run this cluster.cc and not the default cluster.cc? I would like to try it and see if/how it works.
Thanks!/Emma
-
August 2, 2012 at 6:20 pm #7524Anonymous
Clustering is the conventional method for identifying the best structure in these structural prediction cases. The philosophy is that the lowest energy structure which belongs to the largest cluster is probably the one closest to the native structure. (The thought being that lower energy structures which belong to smaller clusters are likely due to deficiencies in the scoring/sampling of Rosetta, rather than being reflective of reality.)
If you’re not aware of them already, the Rosetta tutorials at http://www.meilerlab.org/index.php/jobs/resources do a good job of outlining typical post-analysis strategies.
-
August 3, 2012 at 12:06 pm #7526Anonymous
Thank you for your quick response and suggestions about using clustering.
However, as far I know it is not possible to use the cluster application for RNA in Rosetta 3.1-3.3, without serious modifications. I read in an old post that it will be introduced in the 3.4 release. Do you know if it has been included yet? I don’t have access to 3.4 yet but if I know that the cluster application for RNA is in it I will make sure to update.Thanks,
Emma -
August 3, 2012 at 7:49 pm #7531Anonymous
“it looks like there is an explicit check for RNA in the cluster application (even in 3.3) so that as long as the first residue is RNA, the”
Where is this? grep rna cluster.cc has no returns.
-
August 4, 2012 at 1:20 am #7532Anonymous
It’s not in the cluster application per se (i.e. apps/public/cluster.cc), instead it’s in the GatherPosesMover::get_distance_measure() function of protocols/cluster/cluster.cc, which, as I read the code, is used by the ClusterPhilStyle subclass to do the structure-structure distance calculation.
-
August 8, 2012 at 6:09 pm #7560Anonymous
protocols/cluster/cluster.cc isn’t an application – it’s the implementation of the clustering protocols. The clustering application (apps/public/cluster.cc) should use it in it’s implementation.
Have you tried running the regular clustering application with your RNA structures? (Assuming that the first residue of your pose is RNA.) If so, what happened? What error did you get, if any, and did you get any output?
-
August 8, 2012 at 8:15 pm #7566Anonymous
If you meant rna_cluster.cc, it’s in 3.4, so you’d need to upgrade.
-
August 9, 2012 at 1:56 pm #7568Anonymous
I was running the regular clustering application using the command:
cluster.linuxgccrelease -database /rosetta/3.3/rosetta_database/ -in:file:s *.pdb -in::file::fullatom -out:file:silent out.outThe error is:
ERROR: unrecognized aa rG
ERROR:: Exit from: src/core/io/pdb/file_data.cc line: 655I do not get any output.
I still have not got access to Rosetta 3.4 so I cannot try the rna_cluster application.
/Emma
-
August 9, 2012 at 2:01 pm #7569Anonymous
Try it without in:file:fullatom. The RNA residue type set is not the fullatom residue type set.
-
August 9, 2012 at 5:56 pm #7573Anonymous
Where are these files coming from? Are they Rosetta outputs? If so, the residues should be recognized if you use the same database and residue type settings (e.g. fullatom/centroid, various other items) that you used in the run which generated them.
Note there are changes that need to be made to the database to get Rosetta to run with RNA (see http://www.rosettacommons.org/manuals/archive/rosetta3.3_user_guide/d6/d18/_r_n_a_protein_changes.html). If, for example, generated the structures on one computer with a properly converted database, but then ran the clustering application on another computer with a different database, you might not be able to properly read in the RNA.
If the structures come from a non-Rosetta source, make sure you have the naming conventions correct. For this error, the gotcha is residue name alignment. I believe Rosetta expects the three letter residue name to be ” rG”, and will complain if the three letter residue name is “rG “.
-
August 9, 2012 at 2:20 pm #7570Anonymous
The error is still the same.
-
August 10, 2012 at 8:07 am #7579Anonymous
The files are generated by the rna_denovo application.
I did not know that one has to do changes in the rosetta_database to make the clustering application to work with RNA. Thank you for pointing that out. I do not have permission to make the changes myself on the cluster I am running on but I will try to make sure that someone does it and then I will try again. This is most likely the reason why I am getting the errors when running cluster.cc.
I am waiting for Rosetta 3.4 to be installed so that I can also try the rna_cluster application. What will be the difference between using cluster.cc in 3.3 (with the suggested modifications for RNA) and rna_cluster.cc in 3.4? -
August 10, 2012 at 8:10 am #7580Anonymous
The files are generated by the rna_denovo application.
I did not know that one has to do changes in the rosetta_database to make the clustering application to work with RNA. Thank you for pointing that out. I do not have permission to make the changes myself on the cluster I am running on but I will try to make sure that someone does it and then I will try again. This is most likely the reason why I am getting the errors when running cluster.cc.
I am waiting for Rosetta 3.4 to be installed so that I can also try the rna_cluster application. What will be the difference between using cluster.cc in 3.3 (with the suggested modifications for RNA) and rna_cluster.cc in 3.4? -
August 10, 2012 at 12:23 pm #7581Anonymous
I made the changes to the database according to the instructions, but I still get the same error.
I do not understand if you mean that the pdb file format is incorrect as ” rG” perhaps should be replaced by “rG “. I am using the silent output file or the extracted pdb files from rna_denovo as input to the clustering and I get the same error for both. -
August 10, 2012 at 9:54 pm #7584Anonymous
If the structures are coming from a Rosetta run, then in principle you shouldn’t have to do anything to read them in to another Rosetta program using the same database. (The suggestion for altering the alignment of the residue name was if you were reading in a PDB from an external source – those may or may not have things aligned the way Rosetta expects them to. But if the outputs came from Rosetta and weren’t subject to any modification, they should be okay.) There may be some small residue type set issues (e.g. fullatom vs. centroid), but otherwise it should probably work.
I’m not sure of what all the differences between the cluster and the rna_cluster application are – I’ve never used it, I just noticed it exists. It looks like it uses a slightly different algorithm. See what documentation we have at http://www.rosettacommons.org/manuals/archive/rosetta3.4_user_guide/d2/d82/rna_denovo.html
To be honest, I’m a little baffled why it isn’t working, although I’ll admit I have no real experience dealing with RNA. Could you copy the full tracer output (what gets printed to stdout/the console) of the cluster run to a file and attach it to a forum post? If possible, add the flag “-out:level 400” to the commandline to get the debug-level output.
-
August 13, 2012 at 2:19 pm #7588Anonymous
I have run the 3.3 cluster application using:
1. The silent file from the rna_denovo run as input. This gives the console output in the attached output_error_1.txt file.
Command used: cluster.linuxgccrelease -database /c3se/apps/Glenn/rosetta/3.3/rosetta_database/ -in:file:silent test.out -out:file:silent out.out -out:level 400
2. The pdb files extracted from the silent file as input. This gives the console output in the attached output_error_2.txt file.
Command used: cluster.linuxgccrelease -database /c3se/apps/Glenn/rosetta/3.3/rosetta_database/ -in:file:s *.pdb -out:file:silent out.out -out:level 400 -
August 13, 2012 at 3:14 pm #7590Anonymous
It’s not creating any RNA residue types at all. It’s only reading the centroid types. You said you made changes to the database; if you made the changes I think you made (I don’t know what changes you made) then you fixed the _fullatom_ types to include RNA. So, try these command lines with -in:file:fullatom.
-
August 14, 2012 at 8:53 am #7601Anonymous
I found that it was not only -in:file:fullatom that was required for running the clustering with RNA (in 3.3), but also -in:file:silent_struct_type rna.
Full command used: cluster.linuxgccrelease -database rosetta_database -in:file:silent test.out -in:file:silent_struct_type rna -in:file:fullatom -out:file:silent out.out -out:file:silent_struct_type rna
This worked for me and I got a silent output file (out_3.3.txt) and console output (output_3.3.txt). I was running this with the default clustering radius.
My concern now is how to interpret the output data. As far as I can understand the structure model S_000089 is the best scoring structure in the largest cluster and should be the most reliable structure. Is it possible to extract the cluster files c.*.*.pdb etc. somehow?
with
I have also got access to Rosetta 3.4 and I have now tried the rna_cluster application using the following command (with the silent output file from rna_denovo in 3.3):
rna_cluster.linuxgccrelease -database rosetta_database -in:file:silent test.out -out:file:silent out.out
This resulted in the output silent file out_3.4.txt and gave the console trancer output included in output_3.4.txt.
I find, however, this output to be harder to interpret compared to the output from 3.3. I can see that the S_000089 structure is still in the top, but the output does not state the details of the clustering, like in 3.3. Am I missing some additional command so as to get a more clear output?What clustering radius would you suggest for RNA? The default 2 Å?
-
August 14, 2012 at 12:49 pm #7602Anonymous
Another thing; is it possible to run rna_denovo in parallel? I have been trying to do so but all I get is that the application is run separately on each processor.
The command that I am using: mpiexec rna_denovo.mpi.linuxgccrelease -fasta RNA.fasta -native native.pdb -nstruct 10000 -out:file:silent test.out -cycles 10000 -minimize_rna -database rosetta_database -
August 14, 2012 at 3:04 pm #7604Anonymous
Recall that Rosetta is not multithreading, so the only benefits of MPI are organizational (all results in one directory), not speed. If it was having each of N processors all create output_0001, then all create output_0002, then MPI was not working, and since you did it right probably doesn’t work. If it was a bunch of independent jobs creating different trajectories, then that’s how it’s supposed to work.
So far as I can tell, rna_denovo in 3.3 (actually, everything in the rna module) does not run in MPI – at least I can’t find any hooks in the code. If you want to run on lots of processors, the easiest thing to do is to set up a script like (this is pseudocode):
for i in range(1-N):
mkdir i
cd i
make symlinks to inputs
rosetta @options -constant_seed -jran fixed_random_number+iNotice the use of a rolling argument to jran to ensure different trajectories on each processor.
If you *need* MPI due to sysadmin-imposed constraints, there is an unmaintained “mpilistwrapper” which lets you run a bunch of Rosetta jobs under a veneer of MPI, using non-MPI rosetta.
-
August 14, 2012 at 3:22 pm #7605Anonymous
“As far as I can understand the structure model S_000089 is the best scoring structure in the largest cluster and should be the most reliable structure. “
That is the interpretation of the most cluster-experienced person I could get to look at this.
“Is it possible to extract the cluster files c.*.*.pdb etc. somehow?”
Yes – score_jd2 -in:file:silent ???.out -in:file:silent_struct_type rna -in:file:tags “put a list of which pdbs you want here, based on their tag in the silent file”. score or extract_pdbs will probably also work.
“What clustering radius would you suggest for RNA? The default 2 Å?”
This is empirical. Change it if you don’t like the performance. The code’s author probably set the default to a good value.
“Am I missing some additional command so as to get a more clear output?”
None that I can find. I found a boolean option “auto_tune” but no documentation on what it does.
-
September 6, 2012 at 10:56 am #7762Anonymous
I am trying to understand how to relate a rna_denovo run on the server to a rna_denovo application run on our Linux cluster. I have been running a few test job on the server, but I am not sure how to interpret the results.
First of all, I cannot understand the definition of “Cluster center models”. Are they the best scoring models in each cluster? It seems like the cluster center models C-01-C-20 are identical to the top-20 lowest energy structures M-1-M-20. That makes no sense to me. How is it possible to know which cluster is in that case the largest one which should contain the “best structure”?
Which clustering method is used on the server; the 3.3 (cluster) or 3.4 (rna_cluster) one? -
September 10, 2012 at 2:50 pm #7773Anonymous
http://rosettaserver.graylab.jhu.edu/documentation/rna_denovo doesn’t seem to say. Do you have a log file from the run (the first few lines may say)? I asked Sergey (the server’s administrator).
The documentation I just linked (under “Interpreting Results”) says that it ranks clusters by energy of the best-scoring member of the cluster, and then returns the best-scoring member of the clusters – cluster size is apparently not part of what it’s using. (I’ve never used any of the code in question, so I’m more or less as lost as you). This is consistent with the ranking similarity you saw. It implies a lack of convergence, though – all the top-scoring models assort into different clusters.
-
September 10, 2012 at 4:26 pm #7774Anonymous
It’s running off of a two-month-old version of developer trunk, so it is effectively running off of 3.4.
-
September 10, 2012 at 7:04 pm #7775Anonymous
Rhiju confirms that this means lack of convergence (which means probably all models are wrong).
-
September 12, 2012 at 7:24 am #7786Anonymous
So you are saying that there is no way I can use the results from the denovo runs on the server?
I also have the option of running Rosetta on our cluster, but no matter what cluster radius or clustering method (3.3 cluster or 3.4 rna_cluster) I am using I do not obtain any good results; that is the best scoring model of the largest cluster is not at all a good model (very large RMSD to the native structure). I am thinking that perhaps I am doing something wrong during the process but I cannot figure it out myself. Unfortunately I do not know how to move forward with my study now.
A useful option in the clustering apps would be to have the option of simultaneously extract the best-scoring models of each cluster while running the app, which is now requirred to be done manually from the information in the command line output (if I did not miss any option). Is that something that could be possible in a future release? -
September 19, 2012 at 11:08 am #7818Anonymous
Would you know of anyone having specific experience in rna_denovo and interpretation of results from those runs that could be willing to assist me? I cannot get much out from the documentation and I do not know how to move on.
-
-
AuthorPosts
- You must be logged in to reply to this topic.