Rosetta CM – Ignore Sporadic Errors

Member Site Forums Rosetta 3 Rosetta 3 – General Rosetta CM – Ignore Sporadic Errors

Viewing 2 reply threads
  • Author
    Posts
    • #3035
      Anonymous

        Hey all,

        Im using Rosetta CM based on this tutorial: https://www.rosettacommons.org/docs/latest/application_documentation/structure_prediction/RosettaCM. I run the jobs in batches of 1000 and most of them don’t complete all 1000 because of a “Cannot normalize xyzVector of length() zero” although they did complete anywhere between 60 and 300 before quitting. From some quick searches it seems like this is due to a unlucky random seed pick, so I was wondering if there was any way to tell rosetta to ignore the error and restart that job or skip it.

        I tried using “-in:file:skip_failed_simulations true” and “-jd2:failed_job_exception false” by creating a bad input pdb and ran it through score_jd2 to test if those would work but they didn’t seem to do the trick. Any suggestions would be greatly appreciated.

         

        Thanks

      • #14260
        Anonymous

          Unfortunately, there isn’t a way to bypass that error.

          This is one of the more infuriating errors in Rosetta, and developers are looking into ways of getting around it, but at the moment you can only just restart the job and hope to make further progress on things.

          • #14270
            Anonymous

              Thanks for the response.

              In that case, is there an option to pass to the combine silent files executable that would tell it to completely renumber instead of just adding the _1, _2, etc. afterwards?

          • #14262
            Anonymous

              You can always give up on the job distributor.  If you write a shell script (look at JD0 in the tools repository adjacent to your main repository) you can just run all your nstruct as separate command lines.  instead of one Rosetta call at nstruct 1000, 1000 Rosetta calls at nstruct 1 each.  be sure to track your random number seed if you do this.

              This is not a GOOD solution or a PRETTY solution but it will at least keep one run’s death from affecting the others downstream.

               

              • #14271
                Anonymous

                  Thanks for the suggestion, it sounds like a fun challenge. I haven’t looked extensively at the options for this one yet, but is there a way to tell rosetta to append the output to the end of an already existing silent file instead of overwriting it or writing a whole new file? 

                • #14273
                  Anonymous

                    I am 99% certain that if you give it a silent file path that is already present on disk it will just append.  I’m also pretty sure it’s smart enough to check the indices and pick an output name that’s not already in use in the file.  It will cause problems if the score fields change, though – it only prints the SCORE line with the score term names at the top; if that changes the scores will become uninterpretable because the labels don’t get repeated.  That should be a nonissue here.

                  • #14278
                    Anonymous

                      It will definitely append results to a silent file that already exists.

                      However, to support decent restart behavior (so you don’t need to re-run your entire protocol if your cluster dies at 99999 structures out of 100000), it will check the silent file for the desired output prior to starting the job. If that particular output already exists, then it will skip that job.

                      In your case, this can help. If you write a wrapper script which restarts the job (with exactly the same commandline) if it detects there’s insufficient output jobs, Rosetta should pick up right where it left off.

                      The other way around this is to tell Rosetta to use a unique label for each output structure. You can do this with -out:prefix or -out_suffix. For example, adding something like `-out_suffix _${JOBID}` to the commandline will allow you to launch different jobs, each going to uniquely labeled outputs. (Presuming you’re using a loop which sets the JOBID environment variable to unique numbers for each run.) — These can go into the same silent file, so long as there’s only one process writing to that silent file at a time. (Multiple non-MPI processes writing to the same silent file at the same time is a recipie for disaster.) Alternatively, you can have each of them write to a separate silent file, and then just concantenate the files later (with `cat`, or with the combine_silent application).

                Viewing 2 reply threads
                • You must be logged in to reply to this topic.