You may be interested in reading Getting Started. More similar recommendations can be found scrolling to the bottom of this page, in the See Also section.
Leaver-Fay, A., et al., ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods in enzymology, 2011. 487: p. 545.
Kaufmann, K.W., et al., Practically useful: what the Rosetta protein modeling suite can do for you. Biochemistry, 2010. 49(14): p. 2987-98.
Many servers exist to run various Rosetta protocols. Descriptions of these servers can be found on the Rosetta Servers page.
Rosetta applications (including RosettaScripts) are typically run through a terminal window. The command line is composed of two major parts. First, a path to an application executable is required, while the second part is a list of options for the particular Rosetta simulation. For example:
path_to/some_rosetta_app.linuxgccrelease -database path/to/rosetta/Rosetta/main/database other\_flags
For a few examples, see the commands collection page.
After Rosetta is compiled, links to binary executables are copied to the Rosetta/main/source/bin directory. (This is the bin/ directory off of the directory where you compiled the code.) Full paths to these executables need to be given when running Rosetta, unless this directory is added to the PATH variable in your shell profile (~/.bashrc (linux), ~/.bash_profile (mac), etc).
The Rosetta database contains important data files used by Rosetta during runs (for example, the definitions of what atoms are in alanine, atomic charges, Lennard-Jones radii, scorefunction weight files, ideal bond lengths and angles, rotamer libaries, etc). Rosetta must in some way know the path to this directory.
If you have the Rosetta code and database directories laid out in the standard fashion (e.g. main/source/bin/ and main/database/), Rosetta can often automatically determine where the database directory should. If this does not work for you, or if you relocate or symlink the executables and/or database, you may need to explicitly set the database, as described below. Explicitly setting the database path on the commandline or with an environment variable will take precedence over the autodetermined database path.
If you are using an older Rosetta build and the ROSETTA3_DB environment variable is not set (or your database has been moved from the typical relative install), you must specify the path to this database directory in the command line to run Rosetta simulations. For example:
rosetta.linuxgccrelease -database path/to/rosetta/main/database other_flags
As with all Rosetta options, this can also be provided with an options file.
Rosetta will automatically check the
$ROSETTA3_DB environment variable. If this is present, the
-database option need not be set. To set it temporarily in your shell session:
Set the variable in your shell's user settings file (which will run every time you open a terminal), such as for the default shell bash:
$HOME/.bashrc for linux and
$HOME/.bash_profile for mac. Make sure to source this file
$HOME/.bashrc or open a new tab so that the variable is set.
If the -database option is present on the commandline or in an options file, the value specified there will override any ROSETTA3_DB environment variable setting.
fixbb.macgccrelease -in:file:s myinput.pdb -database mypath
Options and arguments to the options, are separated by whitespace. A single or double colon is using to clarify options via OptionGroups when there are multiple separate options with the same name. Multiple layers of colons may be needed.
Options can also be written in a options file (also called a flags file). In this file, put one option on each line, still using the colon or double colon is using to specify the layers. An example options file appears below.
-database /home/yiliu/Programing/branches/Rosetta/main/database -in:file:s 1l2y_centroid.pdb -in:centroid_input -score:weights centroid_des.wts
If this file were called “flags”, then it would be used like this (notice the @ symbol):
fixbb.macgccrelease @ flags
Note that other options can still be set before or after the flags file is specified, and MULTIPLE flag files can be used - for example
@ flags1 @ flags2 @ flags3 . This will essentially combine flags1 through three - each time overiding any options set in the previous flags. For setting multiple flag files through a batch run, see the
-run:batches option described in the run options.
As of March 2018, Rosetta can now be run with a user configuration file.
This file is basically an options file that is loaded at the start of each Rosetta run.
To start with, go to your home directory and create a directory that will be home to any Rosetta configurations.
mkdir .rosetta && mkdir .rosetta/flags
Rosetta will now look in that directory each time it is run. If a file named
common is found in
$HOME/.rosetta/flags or if it/they are in the current working directory, we use that instead. You can set any number of flag configurations with the
-fconfig option. By default (you do not need to pass this), we have:
-fconfig option is also useful if you have a set of flags for different purposes - like design, glycans, and antibodies, so you could do something like:
-fconfig common antibody
That would load both the common and antibody configurations (which again, are flag files in
If you have a common flag file which you wish to ignore for a particular run, you skip loading through an option
Finally, the options that are loaded from these files are output to the Rosetta log on startup.
Historically, each instance of Rosetta took advantage of only a single processor core. Parallel sampling was typically accomplished by launching many independent Rosetta processes. This allowed separate jobs to be carried out simultaneously, but there are many circumstances in which one may wish to complete a single Rosetta job more quickly using multiple cores. Multi-threading support has recently (as of 5 November 2019) been added to Rosetta. Most Rosetta modules do not yet support multi-threading, but some core algorithms have been parallelized. To take advantage of multi-threading, the following considerations are important:
Rosetta must be compiled with the
extras=cxx11thread option appended to the
scons command. This will produce Rosetta executables named .cxx11thread..
Rosetta applications compiled with threading support will by default launch one thread. To launch more, the number desired must be specified with the
-multithreading:total_threads # flag. A value of 0 will launch one thread for each core (or hardware thread, on hyper-threaded nodes) available on a node. On laptops and personal computers on which one is running only one instance of Rosetta, this is often ideal. Note that on large nodes with many cores, you may wish to launch many Rosetta processes, each launching a small number of threads -- for example, you may wish to launch 16 Rosetta processes each with 4 threads on a 64-core node. (Note that if multi-threading is used in conjunction with MPI process-level parallelism, the
-multithreading:total_threads flag limits the number of threads launched per process).
Individual Rosetta modules that support multi-threading will by default try to use all available threads on a first-come-first-served basis. For example, let's suppose that module A calls module B, and both attempt to use threads. Let's also suppose that there are 16 threads in total. Module A will by default request that its work be distributed over all 16 threads, each of which can invoke module B. Module B will also request that its work be distributed over threads, but will find no threads free, and will therefore have to carry out its work in the calling thread. Since "inner" modules are given lower priority than "outer" modules, a user may manually limit the number of threads requested by a module with appropriate commandline flags, RosettaScripts options, or PyRosetta options (see below). In the example above, one could restrict module A to 4 threads, and module B to 4 threads. In this case, each of the 4 threads assigned to module A can invoke module B, and each of the 4 invocations of module B can be assigned 4 threads (for a total of 16). Note that a module is always assigned at least one thread (the requesting thread), and at most the lesser of the total thread count or the number requested.
Currently, the following modules and tasks are multi-threaded. The number of threads that they can request can be controlled as described in the following table:
|Module||Task||Commandline control||RosettaScripts control||PyRosetta control|
|Packer||Interaction graph pre-calculation.||-multithreading:interaction_graph_threads||RestrictInteractionGraphThreadsOperation task operation||RestrictInteractionGraphThreadsOperation task operation|
For developers, please see the page on the RosettaThreadManager for information about how to multi-thread your favourite Rosetta module.
score12scoring function is not currently threadsafe. At some point, we will fix whichever score term is currently creating problems.
Where threads are a useful means of parallelizing the execution of blocks of code involving many small tasks that share memory, across a limited number of cores on a single node, process-level cross-communication can also be useful for job-level parallelism. Most Rosetta applications support job-level parallelism using MPI (the Message Passing Interface). MPI allows many processes to communicate with one another by passing messages. This is advantageous over entirely independent processes, since it allows load-balancing (processes that finish their work sooner can do the work that would otherwise be waiting in the queue of a slower process) and, in some cases, data reduction and analysis prior to output (see, for example, the simple_cycpep_predict application). If the Rosetta MPI executables were compiled (using the
extras=mpi option with
scons, for example), then in the executable directory there will be an extra set of executables specifically for MPI, for example
fixbb.mpi.linuxgccrelease . If these have not yet been compiled, please refer to the Setting Up Rosetta 3 page for more information. To run these executables, simply run them via mpiexec (or mpirun for older mpi implementations):
mpiexec -np 16 fixbb.mpi.linuxgccrelease -database /path/to/database @ flags
Although typically used on large computer clusters, MPI can be installed on multiprocessor linux and mac machines. If you have a shiny new 8 core desktop, use should be able to use MPI. There are many different flavors of MPI, but openmpi seems to work well on both Ubuntu and MacOSX.
Most applications are currently compatible with MPI through The Job Distributor. See the MPI JobDistributor section for fine control over how Rosetta will use MPI with your run.
A useful option to use when running Rosetta via mpi is
-mpi_tracer_to_file path/to/log/dir . This will separate the output of each processor into separate files.
Here is an example of the general command I put in a bash script to run via Qsub using environment variables for cluster runs:
mpiexec -np $np --machinefile $HOME/dna.machinefile $program.mpi.linuxgccrelease -database $ROSETTA3/database -nstruct $nstruct -ex1 -add_orbitals -ex2 -use_input_sc -ignore_unrecognized_res @ $flag -mpi_tracer_to_file $HOME/rosetta_run_logs/debug/$debug_log
Note that MPI-based job distribution can be used in conjunction with multi-threading if both options are specified during compilation (
extras=cxx11thread,mpi). In this case, it is important to limit the product of the number of processes per node and the number of threads per process to equal the number of cores per node: the default behaviour of the multi-threaded build is to launch one thread per node core per process, which would result in oversubscription if more than one process per node is launched. See the multi-threading section, above, for information on limiting the total threads per process.
As a final note, it is highly recommended to enable the
serialization extra any time that Rosetta is built with MPI support (
extras=mpi,serialization). This is needed for certain types of inter-process communication.
Options in Rosetta are grouped by their functional and protocol usages. Each group has at least one layer, the parent layer. Most of the groups have one or more sub-layers holding multiple options. You can use single or double colon to separate the layers. If the option is unique, such as nstruct, one does not need to specify the groups and Rosetta will warn you if this is done, but there are indeed multiple options with the same name.
fixbb.linuxgccrelease -in:file:s myinput.pdb -out:file:o myoutput fixbb.linuxgccrelease -in::file::s myinput.pdb -packing::ex1
All the option types are pre-defined, and you can figure out the the type of parameters of each option by reading the option types. Here is a list of Rosetta options types:
For Example: Option "database" is a Path type option, so it is followed by path format parameters as
Option "ex1" is a Boolean type option and set to be false by default, so you can activate it as
Option "nstruct" is a Integer type option, you can use it as
Option "backrub:pivot_residues" is a IntegerVector type option, so
-backrub:pivot_residues 10 11 12 13
There are a few good places to look for help.
If you pass
-help as a flag on the command line, Rosetta will spit out all existing options and then quit (ignoring other flags).
Be sure to check out the demos and protocol captures for help with specific apps. These are curated demonstrations of how to use a particular app, with options, general recommendations, input files, etc. These demos are especially helpful for protocols that use RosettaScripts.
Supplemental material of newer Rosetta papers should have the full command-line to use and all the options that were used to generate whatever data the paper is referring to. Though there may be some option-name-drift through time, these research articles are a great place to start.
If you still require help to run a particular Rosetta application or protocol, checkout www.rosettacommons.org/forum for more information. The corresponding author of the application or protocol may be able to help as well.
-in:path:pdb directory/to/pdb/filesis specified. See this page for more common input options.
sortcommand. You can sort on a particular column using the -kx option. See this page for more.
sort -k5 my_score_file.sc, which would sort by the 5th column, or the 4th score term.
Rosetta is a highly versatile piece of software, and both its options system and scripting system help give it this versatility. Many Rosetta applications share common options, especially in regard to input and output (as most share a common Job Distributor, JD2). It is a good idea to review some of these options and see how they can be of use to you.