As more and more code is written in Python — both PyRosetta protocols and other accessory scripts — it will be good to have a consistency and quality to our code. This page presents a list of guidelines and conventions for writing Python code in the Rosetta community.
This page is modeled after the C++ coding conventions; for the list of those conventions, see: Coding Conventions.
If not otherwise stated, the Python style guidelines presented in PEP 8 should be followed.
All Python code should have the following general file layout:
If the Python code is a script, include the following line, including the path to Python:
Note that, if present, this line must be the very first line in the file, before the copyright header, docstring, or additional comments.
The Rosetta Commons copyright header is required for every source code file in Rosetta. Do not make modifications. The header you should use for all
.py files is:
# (c) Copyright Rosetta Commons Member Institutions. # (c) This file is part of the Rosetta software suite and is made available under license. # (c) The Rosetta software is developed by the contributing members of the Rosetta Commons. # (c) For more information, see http://www.rosettacommons.org. Questions about this can be # (c) addressed to University of Washington CoMotion, email: email@example.com.
Immediately below the copyright notice block comments should go the "docstring" for the
.py file. (See more on documentation below, including how Doxygen reads Python comments.) This text should be opened and closed by a triplet of double quotes (
Include headers such as "Brief:", "Params:", "Output:", "Example:", "Remarks:", "Author:", ''etc''.
"""Brief: This PyRosetta script does blah. Params: ./blah.py .pdb Example: ./blah.py foo.pdb 1000 Remarks: Blah blah blah, blah, blah. Author: Jason W. Labonte """
import statements come after the header and before any constants.
Import only one module per line:
import rosetta, rosetta.protocols.rigid
from rosetta import Pose, ScoreFunction
For really long Rosetta namespaces, use namespace aliases, e.g.:
import rosetta.protocols.loops.loop_closure.kinematic_closure as KIC_protocols
* from any module. With large libraries, such as Rosetta, this is a waste of time and memory.
import statements in the following order with comment headings (e.g.,
# Python standard library) and a blank line between each group:
rosetta.init() belongs in the main body of your script, not in the imports section. (See Main Body below.)
Module-wide constants and variables should be defined next, after the imports.
Constants should be named in
ALL_CAPS_WITH_UNDERSCORES. (See Naming Conventions below.)
Avoid using the
global statement anywhere in your code. All constants should be treated as read-only.
Do not define your own mathematical constants. Use the ones found in the Python
Classes and exposed methods should come next in the code.
Add two blank lines between each class and exposed method.
Add one blank line between each class method.
Docstrings for classes and methods are indented below the class/method declaration as part of the definition.
Group non-public methods together, followed by shared methods. Non-public methods should also be prefixed with (at least) a single underscore. (See Naming Conventions below.)
For methods, if there are too many arguments in the declaration to fit on a single line, align the wrapped arguments with the first character of the first argument:
def take_pose_and_apply_foo_to_its_bar_residue_n_times(pose, foo,
"""Apply foo to the bar residue of pose n times."""
If your Python code is intended to be used as a script as well as a module, put
if name == "main":
at the end of the file.
If your Python code is intended to be a module only, do not include
rosetta.init(). It should be in the main body of the calling script.
If your Python code is intended to be a script only, do not include the
if name == "main": check.
As in Rosetta 3 C++ code, use the following naming conventions:
CamelCasefor class names (and therefore, exception names).
Warning, as appropriate.
box_car with underscores separating words for variable and method names.
is_for functions returning a boolean).
PyJobDistributor.native_pose = pose.)
box_car with underscores for namespaces & directories, i.e., modules & packages.
box_carwith underscores even for modules containing only a single class. (This differs from the C++ convention (e.g.,
Pose.cc) and is because the filenames themselves become the namespace in Python.)
It is OK to use capital letters within variable and method names for words that are acronyms or abbreviations, e.g.,
Likewise, it is OK to use an underscore to separate an acronym or abbreviation from other words in class names, e.g.,
In addition, the following conventions are unique to Python code:
ALL_CAPS with underscores for constants.
_box_car with a leading underscore for non-public methods and
_CamelCase with a leading underscore for non-public classes.
from module import *.
box_car_ with a trailing underscore only to avoid conflicts with Python keywords, e.g.,
box_car with leading and trailing double underscores; that is reserved for special Python keywords, e.g.,
Avoid one-letter variable names. A descriptive name is almost always better.
x, y, z, i, j, k.
self as the name of the first argument for class methods...
Python automatically passes objects into methods by reference and not by value. Thus, there usually is not a need to return an object that was passed and then modified by the method.
As in C++, conditional checks should happen inside the called method rather than in the calling method when possible. This helps keep things a bit more modular and also ensures that your method has no bad side effects if someone calls it but forgets to check for the essential condition. For example:
my_method() begins with
if not condition_exists:
Avoid multiple inheritance.
All custom exceptions should inherit from
Exception. Do not use string exceptions. (They were removed in Python 2.6. See Exception Handling below.)
Check the type of arguments passed to a Python class method if it is possible that that method could be called from both Python and C++.
get()method) for Python to use them.
To avoid this issue, check the type of any object arguments passed; if they are APs, call
get(). For example:
def apply(self, pose):
if isinstance(pose, rosetta.core.pose.PoseAP):
pose = pose.get()
(For convenience, a "dummy"
get() method has been added to
Pose that returns the instance of the
Pose, so that one can use
pose = pose.get() without checking its type first. However, it would be impractical to add a
get() method to every class in Rosetta that one might wish to use in PyRosetta, so check the type!)
Be careful not to say
pose = native_pose when you really mean
pose.assign(native_pose). The former creates a shallow copy; the latter a deep copy.
For comparisons to singletons (
None, a static class) use
if is_option_on is True:; write
if not is_option_on:.
if variable:for booleans;
if variable is not None:is safer. (Some container types, e.g., can be false in a boolean sense.)
if not list:in place of
if len(list) == 0:.
if x < 10 and x > 5:; simplify this to
if 5 < x < 10:.
!= instead of
<> for consistency.
if my_string.startswith("foo"): and
if my_string.endswith("bar"): instead of
if my_string[:3] == "foo": and
if my_string[-3:] == "bar". Besides the fact that the former is *way easier to read, it's safer.
if isinstance(object, rosetta.core.pose.Pose):; do not use
if type(object) is rosetta.core.pose.Pose:.
optparsewas deprecated and replaced in Python 2.7, and
argparsedoes all the same things.
try/except blocks to test for errors that could happen in normal program operation. Errors should generate useful reports that include the values of relevant variables.
Exception. (See Classes above.)
raise HugeF_ingError("Oh, crap!"), instead of
raise HugeF_ingError, "Oh, crap!". (This makes it easier to wrap long error messages. Plus, it's going away in Python 3.0.)
raise "Oh, crap!"; that already went away in Python 2.6.
Keep the number of lines tested in a
try block to the bare minimum. This makes it easier to isolate the actual problem.
except clauses. They make it more difficult to isolate the actual problem.
except Exception:, which is better than
except:because it will only catch exceptions from actual program errors; naked
excepts will also catch keyboard and system errors.
except HugeF_ingError, SneakyError:.
If you need to instantiate a particular Rosetta
vector1_ container for use in a Rosetta function, if possible, use PyRosetta's
Vector1() constructor function, which takes a list as input and determines which
vector1_ is needed.
For example, instead of:
list_of_filenames = utility.vector1_string()
list_of_filenames.extend(["file1.fasta", "file2.fasta", "file3.fasta"])
list_of_filenames = Vector1(["file1.fasta", "file2.fasta", "file3.fasta"])
pose_from_sequence() instead of
make_pose_from_sequence(). (The former has ω angles set to 180° automatically.)
a += n, not
a = a + n.
a -= n,
a *= n, and
a /= n.
Use list comprehensions. They are beautiful.
For example, instead of:
cubes = 
for x in range(10):
cubes = map(lambda x: x3, range(10))
cubes = [x**3 for x in range(10)]
(Lambda functions are super cool, but the second example is far less readable than the third; if you have a
lambda inside a
map, you should be using a list comprehension instead.)
You can also use
if to limit the elements in your list:
even_cubes = [x3 for x in range(10) if x3 % 2 == 0]
with when opening files. Context management is safer, because the file is automatically closed, even if an error occurs.
For example, instead of:
file = open("NOEs.cst")
constraints = file.readlines()
with open("NOEs.cst") as file:
constraints = file.readlines()
Doxygen can autodocument Python code in addition to C++ code, but in the case of Python, it simply reads the
doc attribute of every module, class, and method. This is why it is crucial to put all class and method docstrings indented below the declaration line. Otherwise, they will not be stored in the proper
doc variable. (In Python,...
class MyClass(): """This is my class. It is great. """
...is equivalent to...
class MyClass(): __doc__ = "This is my class.\nIt is great.\n\n"
Always use double quotes for docstrings. While Python recognizes both
''', some text editors only recognize the former.
All classes and methods must contain a docstring with at least a brief statement of what the class or method is or should do, respectively.
"""on the same line.
"""This class contains terpene-specific ligand data."""
"""Return True if this ligand is a sesquiterpene."""
Docstrings for scripts should be written to serve also as the script's help/usage message.
Docstrings for modules should list all public classes and methods that are exported by the module.
Document a class's constructor in the
init() method's docstring, not in the class's docstring.
Comments should usually be complete sentences.
Use block comments to talk about the code immediately following the comments.
Use inline comments (sparingly) to clarify confusing lines.
As mentioned at the top of this page, when it doubt, follow PEP 8.
Use 4 spaces to indent, not tabs.
Indent 4 spaces per nested level.
Indent 4 additional spaces when method arguments must be wrapped beyond the first line.
def take_pose_and_apply_foo_to_its_bar_residue_n_times( pose, foo, bar, n): """Applies foo to the bar residue of pose n times.""" passExample 2:
def take_pose_and_apply_foo_to_its_bar_residue_n_times(pose, foo, bar, n): """Applies foo to the bar residue of pose n times.""" pass
While there are numerous different styles for using spaces in argument lists and expressions in C++, PEP 8 recommends the following (and provides many examples):
Do not put spaces around parentheses, brackets, or curly braces.
Do not put spaces before commas, colons, or semicolons.
Put one space around operators.
=if used as part of a keyword argument or default value assignment in a method call or declaration, respectively.
y = ax2 + bx + c.
Limit lines to a maximum of 79 characters.
</code> to wrap lines, do so after any operators.
Do not put semicolons at the end of lines.
Do not use semicolons to combine multiple lines.
[to be added later… ~ labonte 20:56, 31 Aug 2012 (PDT)]