Less silly way to get number of iterations of Relax

This topic has 2 replies, 2 voices, and was last updated 3 years, 5 months ago by Anonymous.

Viewing 2 reply threads

Author

Posts

July 20, 2021 at 2:00 pm #3811

Anonymous

Currently to get the number of iterations a mover, such as FastRelax does, I enable the dump_trajectory scoretype and count the models in the PDBs (code below).

This seems very very weird way of doing this, so I was wondering if there is a less silly way of doing this?

Also, is there a direct way to open a multimodel file say to get the scores of each? (Rather than extracting the models, writing to file, reading them into PyRosetta and scoring them).



# ============ Methods ======================================

DumpTrajectoryEnergy = pyrosetta.rosetta.core.energy_methods.DumpTrajectoryEnergy

#formerly: DumpTrajectoryEnergy = pyrosetta.rosetta.core.scoring.util_methods.DumpTrajectoryEnergy

def enable_trajectory(scorefxn: pyrosetta.ScoreFunction,

prefix:str='pose',

stride:int=100,

gz:bool=False) -> DumpTrajectoryEnergy:

dt_st = pyrosetta.rosetta.core.scoring.ScoreType.dump_trajectory

scorefxn.set_weight(dt_st, 1)

#

make the traj dump less intense

# settable by cmd line options... maybe better

old_traj_energy = [k for k in scorefxn.all_methods() if isinstance(k, DumpTrajectoryEnergy)][0]

scorefxn.all_methods().remove(old_traj_energy)

emo = pyrosetta.rosetta.core.scoring.methods.EnergyMethodOptions()

if gz:

# emo.dump_trajectory_gz(True)

raise RuntimeError('this just outputs a text file with gz suffix... (?!)')

emo.dump_trajectory_prefix(prefix)

emo.dump_trajectory_stride(stride)

new_traj_energy = DumpTrajectoryEnergy(emo)

scorefxn.all_methods().append(new_traj_energy)

return new_traj_energy



import os, gzip, re, pathlib, datetime

import pandas as pd



def get_traj_interations(prefix:str) -> pd.DataFrame:

raw_trajs = []

for file in os.listdir():

if re.match(prefix, file):

with  open(file,'r') as fh:

file_content=fh.read()

modstamp = pathlib.Path(file).stat().st_mtime

modtime = datetime.datetime.fromtimestamp(modstamp)

models = re.findall('MODEL', file_content)

raw_trajs.append(dict(filename=file,

time=modtime,

n_models=len(models))

)

trajs = pd.DataFrame(raw_trajs).sort_values('time')

cat_df = uncontrajs.filename.str.extract(r'_([^_]+).pdb')

trajs =  cat_df[0] + '_' + cat_df.groupby([0]).cumcount().add(1).astype(str)

return trajs



# ============ Run ======================================

experiment_name = 'test123'

scorefxn = pyrosetta.get_fa_scorefxn()

new_traj_energy = enable_trajectory(scorefxn=scorefxn,

prefix=experiment_name,

stride=10_000)

relax = pyrosetta.rosetta.protocols.relax.FastRelax(scorefxn, 3)

relax.apply(pose)

trajs = get_traj_interations(experiment_name)

trajs.head()

	filename	time	n_models	short_name
43	test_unconstrained__20210720141946221_score.pdb	2021-07-20 14:19:46.429851	4	score_1
27	test_unconstrained__20210720141946966_pack.pdb	2021-07-20 14:19:48.854521	28	pack_1
67	test_unconstrained__20210720141949017_score.pdb	2021-07-20 14:19:49.172959	4	score_2
30	test_unconstrained__20210720141949412_min.pdb	2021-07-20 14:19:52.569970	8	min_1
95	test_unconstrained__20210720141952687_score.pdb	2021-07-20 14:19:52.815413	4	score_3

July 20, 2021 at 3:25 pm #15978
Anonymous
I’d back up for a minute and ask what you’re looking to use this information for. Usually you just count the number of apply()’s of whatever top-level mover you’re using as the number of iterations, and treat what things happen internally as something of a black box (modulo whatever settings are exposed by the configuration of the top-level mover). If you have particular reasons to dig deeper into the internals, you have to carefully define the bounds of what you’re interested in monitoring, because depending on what you’re doing, you might count as “changes” things which aren’t actually the steps/iterations you’re interested in, or you may miss some of the desired steps which are done “externally” or separately.

Regarding getting scores from a multi-model file, if you can’t see scoring information in text form in the file, Rosetta probably hasn’t dumped it out. (You can dump out structure information without dumping scoring information, depending on how you do it.) If that’s the case, you’d need to rescore the structure to get the score.
July 20, 2021 at 8:44 pm #15980
Anonymous
Well, in the case I was interested today I wanted to see how many repacking interations a FastRelax run was doing with and without a particular set of constraints as there was a large running time difference and I was concerned that it was that affecting the run —whereas the distribution of repacking events did change across the gradient of the cycle, there was nothing to worry about and the time difference was probably just the overhead from all the PyRosetta-defined constraints. But in general, I would argue that testing something to make sure it is behaving as hoped is better than just assuming it is.

But actually in FastRelax, one can set the max_inter getter/setter of FastRelax but the apply method does not say how many were done nor, from what I can tell, is there is no counter in pose.energies() or similar or in the log… but I am guessing there maybe one given than one can set the max_iter.

About the multi-model file, I mean simply reading it into memory. The models in a trajectory PDB are not scored, so yes, I know I need to rescore them. If one reads a multimodel PDB file, for example the PDB:1L2Y (tryptophan minicage NMR) it loads the models as separate chains into a single pose, which needs to be split up. I.e. it does not load it as a vector1_core_pose_Pose of poses or similar.
```
original = pyrosetta.toolbox.rcsb.pose_from_rcsb('1L2Y')

poses = original.split_by_chain()
```
It seems rather dangerous and potentially complicated —multichain multimodel etc. Hence why I was wondering if there was a I/O specifically for multimodel PDBs.

Author

Posts

Viewing 2 reply threads

You must be logged in to reply to this topic.