Starting References

You may be interested in reading Getting Started. More similar recommendations can be found scrolling to the bottom of this page, in the See Also section.

Leaver-Fay, A., et al., ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods in enzymology, 2011. 487: p. 545.

Kaufmann, K.W., et al., Practically useful: what the Rosetta protein modeling suite can do for you. Biochemistry, 2010. 49(14): p. 2987-98.

Servers

Many servers exist to run various Rosetta protocols. Descriptions of these servers can be found on the Rosetta Servers page.

Command Line Example

Rosetta applications (including RosettaScripts) are typically run through a terminal window. The command line is composed of two major parts. First, a path to an application executable is required, while the second part is a list of options for the particular Rosetta simulation. For example:

path_to/some_rosetta_app.linuxgccrelease -database path/to/rosetta/Rosetta/main/database other\_flags

For a few examples, see the commands collection page.

Location of Rosetta Executables

After Rosetta is compiled, links to binary executables are copied to the Rosetta/main/source/bin directory. (This is the bin/ directory off of the directory where you compiled the code.) Full paths to these executables need to be given when running Rosetta, unless this directory is added to the PATH variable in your shell profile (~/.bashrc (linux), ~/.bash_profile (mac), etc). export PATH=$PATH:/path/to/rosetta/bin

Rosetta Database

The Rosetta database contains important data files used by Rosetta during runs (for example, the definitions of what atoms are in alanine, atomic charges, Lennard-Jones radii, scorefunction weight files, ideal bond lengths and angles, rotamer libaries, etc). Rosetta must in some way know the path to this directory.

Autodetermination of database path

If you have the Rosetta code and database directories laid out in the standard fashion (e.g. main/source/bin/ and main/database/), Rosetta can often automatically determine where the database directory should. If this does not work for you, or if you relocate or symlink the executables and/or database, you may need to explicitly set the database, as described below. Explicitly setting the database path on the commandline or with an environment variable will take precedence over the autodetermined database path.

Set DB for a single Rosetta run

If you are using an older Rosetta build and the ROSETTA3_DB environment variable is not set (or your database has been moved from the typical relative install), you must specify the path to this database directory in the command line to run Rosetta simulations. For example:

  • rosetta.linuxgccrelease -database path/to/rosetta/main/database other_flags

As with all Rosetta options, this can also be provided with an options file.

Set DB for multiple Rosetta runs

Rosetta will automatically check the $ROSETTA3_DB environment variable. If this is present, the -database option need not be set. To set it temporarily in your shell session:

  • ROSETTA3_DB=path_to_rosetta_db

Set the variable in your shell's user settings file (which will run every time you open a terminal), such as for the default shell bash: $HOME/.bashrc for linux and $HOME/.bash_profile for mac. Make sure to source this file $HOME/.bashrc or open a new tab so that the variable is set.

If the -database option is present on the commandline or in an options file, the value specified there will override any ROSETTA3_DB environment variable setting.

Specifying Options

On the Command-Line

fixbb.macgccrelease -in:file:s myinput.pdb -database mypath

Options and arguments to the options, are separated by whitespace. A single or double colon is using to clarify options via OptionGroups when there are multiple separate options with the same name. Multiple layers of colons may be needed.

On the Command-line via an Options File

Options can also be written in a options file (also called a flags file). In this file, put one option on each line, still using the colon or double colon is using to specify the layers. An example options file appears below.

 -database /home/yiliu/Programing/branches/Rosetta/main/database
 -in:file:s 1l2y_centroid.pdb
 -in:centroid_input
 -score:weights centroid_des.wts

If this file were called “flags”, then it would be used like this (notice the @ symbol): fixbb.macgccrelease @ flags

Note that other options can still be set before or after the flags file is specified, and MULTIPLE flag files can be used - for example @ flags1 @ flags2 @ flags3 . This will essentially combine flags1 through three - each time overiding any options set in the previous flags. For setting multiple flag files through a batch run, see the -run:batches option described in the run options.

Common Options and Default User Configuration

As of March 2018, Rosetta can now be run with a user configuration file. This file is basically an options file that is loaded at the start of each Rosetta run. To start with, go to your home directory and create a directory that will be home to any Rosetta configurations. mkdir .rosetta && mkdir .rosetta/flags

Rosetta will now look in that directory each time it is run. If a file named common is found in $HOME/.rosetta/flags or if it/they are in the current working directory, we use that instead. You can set any number of flag configurations with the -fconfig option. By default (you do not need to pass this), we have:

 -fconfig common

This -fconfig option is also useful if you have a set of flags for different purposes - like design, glycans, and antibodies, so you could do something like:

-fconfig common antibody 

That would load both the common and antibody configurations (which again, are flag files in .rosetta/flags

If you have a common flag file which you wish to ignore for a particular run, you skip loading through an option

-no_fconfig  

Finally, the options that are loaded from these files are output to the Rosetta log on startup.

Running Rosetta with multiple threads

Historically, each instance of Rosetta took advantage of only a single processor core. Parallel sampling was typically accomplished by launching many independent Rosetta processes. This allowed separate jobs to be carried out simultaneously, but there are many circumstances in which one may wish to complete a single Rosetta job more quickly using multiple cores. Multi-threading support has recently (as of 5 November 2019) been added to Rosetta. Most Rosetta modules do not yet support multi-threading, but some core algorithms have been parallelized. To take advantage of multi-threading, the following considerations are important:

  1. Rosetta must be compiled with the extras=cxx11thread option appended to the scons command. This will produce Rosetta executables named .cxx11thread..

  2. Rosetta applications compiled with threading support will by default launch one thread. To launch more, the number desired must be specified with the -multithreading:total_threads # flag. A value of 0 will launch one thread for each core (or hardware thread, on hyper-threaded nodes) available on a node. On laptops and personal computers on which one is running only one instance of Rosetta, this is often ideal. Note that on large nodes with many cores, you may wish to launch many Rosetta processes, each launching a small number of threads -- for example, you may wish to launch 16 Rosetta processes each with 4 threads on a 64-core node. (Note that if multi-threading is used in conjunction with MPI process-level parallelism, the -multithreading:total_threads flag limits the number of threads launched per process).

  3. Individual Rosetta modules that support multi-threading will by default try to use all available threads on a first-come-first-served basis. For example, let's suppose that module A calls module B, and both attempt to use threads. Let's also suppose that there are 16 threads in total. Module A will by default request that its work be distributed over all 16 threads, each of which can invoke module B. Module B will also request that its work be distributed over threads, but will find no threads free, and will therefore have to carry out its work in the calling thread. Since "inner" modules are given lower priority than "outer" modules, a user may manually limit the number of threads requested by a module with appropriate commandline flags, RosettaScripts options, or PyRosetta options (see below). In the example above, one could restrict module A to 4 threads, and module B to 4 threads. In this case, each of the 4 threads assigned to module A can invoke module B, and each of the 4 invocations of module B can be assigned 4 threads (for a total of 16). Note that a module is always assigned at least one thread (the requesting thread), and at most the lesser of the total thread count or the number requested.

  4. Currently, the following modules and tasks are multi-threaded. The number of threads that they can request can be controlled as described in the following table:

Module Task Commandline control RosettaScripts control PyRosetta control
Packer Interaction graph pre-calculation. -multithreading:interaction_graph_threads RestrictInteractionGraphThreadsOperation task operation RestrictInteractionGraphThreadsOperation task operation

For developers, please see the page on the RosettaThreadManager for information about how to multi-thread your favourite Rosetta module.

Known multi-threading issues

  • The score12 scoring function is not currently threadsafe. At some point, we will fix whichever score term is currently creating problems.

Running Rosetta via MPI

Where threads are a useful means of parallelizing the execution of blocks of code involving many small tasks that share memory, across a limited number of cores on a single node, process-level cross-communication can also be useful for job-level parallelism. Most Rosetta applications support job-level parallelism using MPI (the Message Passing Interface). MPI allows many processes to communicate with one another by passing messages. This is advantageous over entirely independent processes, since it allows load-balancing (processes that finish their work sooner can do the work that would otherwise be waiting in the queue of a slower process) and, in some cases, data reduction and analysis prior to output (see, for example, the simple_cycpep_predict application). If the Rosetta MPI executables were compiled (using the extras=mpi option with scons, for example), then in the executable directory there will be an extra set of executables specifically for MPI, for example fixbb.mpi.linuxgccrelease . If these have not yet been compiled, please refer to the Setting Up Rosetta 3 page for more information. To run these executables, simply run them via mpiexec (or mpirun for older mpi implementations):

mpiexec -np 16 fixbb.mpi.linuxgccrelease -database /path/to/database @ flags

Although typically used on large computer clusters, MPI can be installed on multiprocessor linux and mac machines. If you have a shiny new 8 core desktop, use should be able to use MPI. There are many different flavors of MPI, but openmpi seems to work well on both Ubuntu and MacOSX.

Most applications are currently compatible with MPI through The Job Distributor. See the MPI JobDistributor section for fine control over how Rosetta will use MPI with your run.

A useful option to use when running Rosetta via mpi is -mpi_tracer_to_file path/to/log/dir . This will separate the output of each processor into separate files.

Here is an example of the general command I put in a bash script to run via Qsub using environment variables for cluster runs:

mpiexec -np $np --machinefile $HOME/dna.machinefile $program.mpi.linuxgccrelease -database
$ROSETTA3/database -nstruct $nstruct -ex1 -add_orbitals -ex2 -use_input_sc -ignore_unrecognized_res @
$flag -mpi_tracer_to_file $HOME/rosetta_run_logs/debug/$debug_log

Note that MPI-based job distribution can be used in conjunction with multi-threading if both options are specified during compilation (extras=cxx11thread,mpi). In this case, it is important to limit the product of the number of processes per node and the number of threads per process to equal the number of cores per node: the default behaviour of the multi-threaded build is to launch one thread per node core per process, which would result in oversubscription if more than one process per node is launched. See the multi-threading section, above, for information on limiting the total threads per process.

As a final note, it is highly recommended to enable the serialization extra any time that Rosetta is built with MPI support (extras=mpi,cxx11thread,serialization or extras=mpi,serialization). This is needed for certain types of inter-process communication.

Option Groups and Layers

Options in Rosetta are grouped by their functional and protocol usages. Each group has at least one layer, the parent layer. Most of the groups have one or more sub-layers holding multiple options. You can use single or double colon to separate the layers. If the option is unique, such as nstruct, one does not need to specify the groups and Rosetta will warn you if this is done, but there are indeed multiple options with the same name.

For example:

fixbb.linuxgccrelease -in:file:s myinput.pdb -out:file:o myoutput
fixbb.linuxgccrelease -in::file::s myinput.pdb -packing::ex1

Option Types

All the option types are pre-defined, and you can figure out the the type of parameters of each option by reading the option types. Here is a list of Rosetta options types:

  1. Boolean, BooleanVector
  2. Integer, IntegerVector
  3. Real, RealVector
  4. String, StringVector
  5. File, FileVector
  6. Path, PathVector

For Example: Option "database" is a Path type option, so it is followed by path format parameters as

-database yourpath/Rosetta/main/database

Option "ex1" is a Boolean type option and set to be false by default, so you can activate it as

-packing:ex1

Option "nstruct" is a Integer type option, you can use it as

-nstruct 10

Option "backrub:pivot_residues" is a IntegerVector type option, so

-backrub:pivot_residues 10 11 12 13

Getting help with options

There are a few good places to look for help.

  1. Both the general documentation and app-specific docs are extremely helpful.

  2. If you pass -help as a flag on the command line, Rosetta will spit out all existing options and then quit (ignoring other flags).

  3. Be sure to check out the demos and protocol captures for help with specific apps. These are curated demonstrations of how to use a particular app, with options, general recommendations, input files, etc. These demos are especially helpful for protocols that use RosettaScripts.

  4. Supplemental material of newer Rosetta papers should have the full command-line to use and all the options that were used to generate whatever data the paper is referring to. Though there may be some option-name-drift through time, these research articles are a great place to start.

  5. If you still require help to run a particular Rosetta application or protocol, checkout www.rosettacommons.org/forum for more information. The corresponding author of the application or protocol may be able to help as well.

General tips for running Rosetta

  • Most applications use the -s and -l options to specify a single input PDB or a file that lists PDBs, commonly called a PDBLIST. The PDBList file should specify the full path to the PDB (one on each line), unless -in:path:pdb directory/to/pdb/files is specified. See this page for more common input options.

  • If you have a score file output and want to find the lowest energy structure, use the sort command. You can sort on a particular column using the -kx option. See this page for more.
    • Sort by total score: sort my_score_file.sc
    • Sort by energy term: sort -k5 my_score_file.sc, which would sort by the 5th column, or the 4th score term.

  • By default, Rosetta will ignore atoms from an input PDB whose occupancy is 0. If you are missing residues or atoms during a run, this is most likely the cause. To have Rosetta read these atoms anyway, pass the option -ignore_zero_occupancy false

  • By default, Rosetta will fail to load a PDB on residues/ligands it does not recognize, although parameters for these residue types may exist. This is due to the memory needed to understand all of these residue types and their potential chemical modifications (yes, ligands are a residue type. Rosetta is residue-centric). Rosetta probably has parameters for your particular residue type. To enable these residue types, see this page. To ignore these, pass the option -ignore_unrecognized_res

  • By default, Rosetta will fail to load a PDB with waters. This is intentional, as most of the Rosetta applications do not deal with water molecules well and the default scorefunction uses implicit solvation. To have Rosetta read the common WAT type, pass the option -ignore_waters false

Common/Useful Rosetta Options

Rosetta is a highly versatile piece of software, and both its options system and scripting system help give it this versatility. Many Rosetta applications share common options, especially in regard to input and output (as most share a common Job Distributor, JD2). It is a good idea to review some of these options and see how they can be of use to you.

See Also