Relax multiple PDB files with MPI, jd2, and a pdblist on TACC Stampede2

Member Site Forums Rosetta 3 Rosetta 3 – Applications Relax multiple PDB files with MPI, jd2, and a pdblist on TACC Stampede2

Viewing 1 reply thread
  • Author
    Posts
    • #3045
      Anonymous

        Hello,

        I’m having trouble relaxing multiple PDB files via MPI and a pdblist on the TACC Stampede2 cluster. Apparently, only the first file in the pdblist is relaxed, regardless of which PBD file is first in the list. My pdblist has Unix EOL breaks, so that should be good. Do I need to add some flag to relax multiple files in batch using MPI? What am I missing?

        Here’s the SLURM file I’m using:


        #!/bin/bash

        #SBATCH -n 5                  # total MPI tasks in this job

        #SBATCH -N 1                  # total number of nodes (processors), 68 cores each

        #SBATCH -p normal       # queue

        #SBATCH -t 00:10:00      # run time (hh:mm:ss)

        #SBATCH -o RelaxLog     # Name of scoring log file

        #SBATCH -e ErrorLog     # Name of error log file

        module load intel/17.0.4

        module load impi/17.0.3

        module load rosetta

        ibrun relax.cxx11mpi.linuxiccrelease -in:file:l pdblist -in:file:fullatom -relax:quick -nstruct 2 -out:suffix _relaxed -out:path:pdb Output_PBDs -out:path:score Output_Scores -database=$HOME/rosetta_database -mpi_tracer_to_file Output


        I ran the comparable script on my personal computer (not MPI), and the “-in:file:l pdblist” flag works fine (all PDB files in the list are relaxed). Here is that comparable script:


        Rosetta/main/source/bin/relax.linuxgccrelease -in:file:l pdblist -in:file:fullatom -relax:quick -nstruct 2 -out:suffix _relaxed -out:path:pdb Output_PBDs -out:path:score Output_Scores -database Rosetta/main/database/


        So, I’m thinking that there’s something wrong with the way the JobDistributor reads the pdblist. Here’s the output of the master node when I try to run the SLURM file on TACC Stampede2 (and only the first PDB file in the pdblist is relaxed):


        core.init: (0) Rosetta version unknown:exported  from http://www.rosettacommons.org

        core.init: (0) command: /home1/apps/intel17/impi17_0/rosetta/3.8/bin/relax.cxx11mpi.linuxiccrelease -in:file:l pdblist -in:file:fullatom -relax:quick -nstruct 2 -out:suffix _relaxed -out:path:pdb Output_PBDs -out:path:score Output_Scores -database=/home1/01748/pl6218/rosetta_database -mpi_tracer_to_file Output

        core.init: (0) ‘RNG device’ seed mode, using ‘/dev/urandom’, seed=1855461808 seed_offset=0 real_seed=1855461808

        core.init.random: (0) RandomGenerator:init: Normal mode, seed=1855461808 RG_type=mt19937

        core.scoring.ScoreFunctionFactory: (0) SCOREFUNCTION: talaris2014

        core.scoring.etable: (0) Starting energy table calculation

        core.scoring.etable: (0) smooth_etable: changing atr/rep split to bottom of energy well

        core.scoring.etable: (0) smooth_etable: spline smoothing lj etables (maxdis = 6)

        core.scoring.etable: (0) smooth_etable: spline smoothing solvation etables (max_dis = 6)

        core.scoring.etable: (0) Finished calculating energy tables.

        basic.io.database: (0) Database file opened: scoring/score_functions/hbonds/sp2_elec_params/HBPoly1D.csv

        basic.io.database: (0) Database file opened: scoring/score_functions/hbonds/sp2_elec_params/HBFadeIntervals.csv

        basic.io.database: (0) Database file opened: scoring/score_functions/hbonds/sp2_elec_params/HBEval.csv

        basic.io.database: (0) Database file opened: scoring/score_functions/rama/Rama_smooth_dyn.dat_ss_6.4

        basic.io.database: (0) Database file opened: scoring/score_functions/P_AA_pp/P_AA

        basic.io.database: (0) Database file opened: scoring/score_functions/P_AA_pp/P_AA_n

        basic.io.database: (0) Database file opened: scoring/score_functions/P_AA_pp/P_AA_pp

        protocols.relax.FastRelax: (0) ================== Using default script ==================

        protocols.jd2.PDBJobInputter: (0) Instantiate PDBJobInputter

        protocols.jd2.PDBJobInputter: (0) PDBJobInputter::fill_jobs

        protocols.jd2.PDBJobInputter: (0) pushed 5hqi_ignorechain.pdb nstruct indices 1 – 2

        protocols.evaluation.ChiWellRmsdEvaluatorCreator: (0) Evaluation Creator active … 

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Getting next job to assign from list id 1 of 2

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Waiting for job requests…

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Received message from 6 with tag 10

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Sending new job id 1 to node 6 with tag 10

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Getting next job to assign from list id 2 of 2

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Waiting for job requests…

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Received message from 7 with tag 10

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Sending new job id 2 to node 7 with tag 10

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: No more jobs to assign, setting next job id to zero

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Finished handing out jobs

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Waiting for 9 slaves to finish jobs

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Received message from  8 with tag 10

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Sending spin down signal to node 8

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Waiting for 8 slaves to finish jobs

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Received message from  9 with tag 10

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Sending spin down signal to node 9

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Waiting for 7 slaves to finish jobs

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Received message from  1 with tag 10

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Sending spin down signal to node 1

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Waiting for 6 slaves to finish jobs

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Received message from  2 with tag 10

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Sending spin down signal to node 2

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Waiting for 5 slaves to finish jobs

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Received message from  3 with tag 10

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Sending spin down signal to node 3

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Waiting for 4 slaves to finish jobs

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Received message from  4 with tag 10

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Sending spin down signal to node 4

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Waiting for 3 slaves to finish jobs

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Received message from  5 with tag 10

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Sending spin down signal to node 5

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Waiting for 2 slaves to finish jobs

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Received message from  7 with tag 30

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Received job success message for job id 2 from node 7 blocking till output is done 

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Received job output finish message for job id 2 from node 7

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master set job 2 as completed/deletable.

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Waiting for 2 slaves to finish jobs

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Received message from  7 with tag 10

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Sending spin down signal to node 7

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Waiting for 1 slaves to finish jobs

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Received message from  6 with tag 30

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Received job success message for job id 1 from node 6 blocking till output is done 

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Received job output finish message for job id 1 from node 6

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master set job 1 as completed/deletable.

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Waiting for 1 slaves to finish jobs

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Received message from  6 with tag 10

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Sending spin down signal to node 6

        protocols.jd2.MPIWorkPoolJobDistributor: (0) Master Node: Finished sending spin down signals to slaves

        protocols::checkpoint: (0) Deleting checkpoints of FastRelax


        There were two files in that pdblist, but the JobDistributor only pushed the first one. There were no error messages, and the error log was blank. Everything says “successful”, but only the first file in the pdblist was relaxed.

        Any ideas???

        As a side note, I can successfully run the relax protocol on TACC Stampede2 if I just change my SLURM file from ” -in:file:l pdblist” to “-in:file:s 5hqi_ignorechain.pdb,” so I know it’s working fine with a single PDB file.

        Thanks very much for your help!

        Sincerely,

        AJ

      • #14509
        Anonymous

          In case anybody is wondering, here’s the solution to this problem: Add a blank line after the last file in the pdblist.

          So the pdblist file should look like this:


          Structure1.pdb

          Structure2.pdb

          Structure3.pdb

           


          Note the blank line after Structure3.pdb. Apparently the MPI distribution stops AT the last line, not AFTER the last line.

          Thanks!

      Viewing 1 reply thread
      • You must be logged in to reply to this topic.