GROMACS#
GROMACS is a free and open-source software suite for high-performance molecular dynamics and output analysis. In this example, we will follow the example of mdtutorials.com which downloads the Lysozyme protein from the RCSB protein data bank and shows how to use GROMACS to setup a simulation of its dynamics when solvated in a box of water and ions. For brevity’s sake, we will follow the tutorial up until the equilibration stage. The script shown below can be downloaded here.
Warning
This example purely serves to illustrate how to use aiida-shell
.
There is no guarantee that GROMACS is used correctly or in the most efficient way.
The first step is to download the protein:
#!/usr/bin/env runaiida
"""Simulation of lysozyme protein dynamics using GROMACS."""
import urllib.request
from aiida import engine, orm
from aiida_shell import launch_shell_job
# Download the lysozyme protein structure.
with urllib.request.urlopen('https://files.rcsb.org/download/1AKI.pdb') as handle:
lysozyme_pdb = orm.SinglefileData(handle, filename='lysozyme.pdb')
We use the urllib
module of the standard library to open the file and wrap it in AiiDA’s SinglefileData
data type,
which allows it to be stored in the provenance graph. The first step is to perform some simple pre-processing. The
downloaded file defines the protein as embedded in crystalline water, which we don’t need for our purposes and therefore
we remove it. Water molecules are marked by the HOH
residue in the file, so we simply write a function that loops over
the lines in the PDB file and removes any line that contains HOH
.
@engine.calcfunction
def remove_water_from_pdb(protein: orm.SinglefileData) -> orm.SinglefileData:
"""Remove water molecules from a PDB file."""
lines = protein.get_content().split('\n')
lines_without_water = [line for line in lines if 'HOH' not in line]
return orm.SinglefileData.from_string(
'\n'.join(lines_without_water),
filename=protein.filename,
)
# Remove crystalline water from PDB by skipping any lines that contain `HOH` residue.
lysozyme = remove_water_from_pdb(lysozyme_pdb)
Calling the remote_water_from_pdb
with the original lysozyme_pdb
returns the modified lysozyme
file (which is also
an instance of SinglefileData
).
Note
We mark the function remove_water_from_pdb
with AiiDA’s calcfunction
decorator. This ensures the act of modifying
the originally downloaded protein file is captured in the provenance graph.
Next, we run GROMACS’ gmx pdb2gmx
command to convert the file from the .pdb
to GROMACS’ native .gro
format. If we
were to execute this directly on the command line, it would look like:
gmx pdb2gmx -f lysozyme.pdb -o output.gro -water spce -ff oplsaa
When wrapping this in aiida-shell
, the main question is how we pass the protein PDB file to the command, as that is
stored in AiiDA’s provenance graph and is not a file on disk. We simply replace the file name with the {protein}
placeholder and then add the lysozyme SinglefileData
nodes in the nodes
dictionary. The launch_shell_job
will
automatically copy the content of the SinglefileData
to the working directory where the command is executed and
substitute the {protein}
lysozyme with the filename to which the content was copied:
results_pdb2gmx, node_pdb2gmx = launch_shell_job(
'gmx',
arguments='pdb2gmx -f {protein} -o output.gro -water spce -ff oplsaa',
nodes={
'protein': lysozyme,
},
outputs=['output.gro', 'topol.top', 'posre.itp'],
metadata={'options': {'redirect_stderr': True}},
)
Note
The nodes
argument takes a dictionary with an arbitrary number of SinglefileData
nodes whose content is copied to
the working directory where the command is executed. If the key of the node appears as a placeholder in the arguments
argument, it is replaced with its filename.
The gmx pdb2gmx
is expected to generate three outputs of interest:
output.gro
: The protein structure in.gro
formattopol.top
: The topology fileposre.itp
: The include topology file with position restraints
To capture these output files, they are declared using the outputs
argument. aiida-shell
will automatically wrap
these output files in a SinglefileData
and store them as an output of the command in the provenance graph.
The next step is to create the simulation box around the protein using gmx editconf
:
results_editconf, node_editconf = launch_shell_job(
'gmx',
arguments='editconf -f {gro} -o output.gro -c -d 1.0 -bt cubic',
nodes={
'gro': results_pdb2gmx['output_gro'],
},
outputs=['output.gro'],
metadata={'options': {'redirect_stderr': True}},
)
Tip
GROMACS by default writes output to the stderr
file descriptor. This is a bit unususal, as normally this is reserved
for errors. aiida-shell
by default detects if the executed command wrote anything to stderr
and if so, assigns the
exit code 400
to the process,
indicating that it has failed. Since the command did not actually fail, this can be misleading. As a workaround, we set
the metadata option redirect_stderr
to True
which will instruct aiida-shell
to redirect all output written to
stderr
to the stdout
descriptor instead.
The gmx editconf
command takes a GRO file defining the protein structure. In this example, this file was created by
the previous gmx pdb2gmx
command. We can retrieve this output file from the results dictionary and directly assign it
to the nodes
dictionary.
Now that the protein is placed in a simulation box, we can solvate the system in water:
results_solvate, node_solvate = launch_shell_job(
'gmx',
arguments='solvate -cp {gro} -cs spc216.gro -o output.gro -p {top}',
nodes={
'gro': results_editconf['output_gro'],
'top': results_pdb2gmx['topol_top'],
},
outputs=['output.gro', 'topol.top'],
metadata={'options': {'redirect_stderr': True}},
)
This command requires, in addition to the .gro
file produced in the previous step, the .top
topology file created by
the gmx pdb2gmx
command.
The next step is to insert ions into the solution to neutralize the system. GROMACS provides the gmx genion
command
for this, which runs actual molecular dynamics to optimize the positions of the added ions. It requires a
.tpr
file which defines the starting
structure and conditions of a simulation. Such a file is created using the gmx grompp
pre-processing utility which
takes an .mdp
file defining the
parameters of the simulation:
ions_mpd = orm.SinglefileData.from_string(
"""
integrator = steep
emtol = 1000.0
emstep = 0.01
nsteps = 50000
nstlist = 1
cutoff-scheme = Verlet
ns_type = grid
coulombtype = cutoff
rcoulomb = 1.0
rvdw = 1.0
pbc = xyz
ld_seed = 1
gen_seed = 1
""",
filename='ions.mdp',
)
# Run `gmx grompp` to pre-process the parameters for ion insertion.
results_grompp_genion, node_grompp_ion = launch_shell_job(
'gmx',
arguments='grompp -f {mdp} -c {gro} -p {top} -o output.tpr',
nodes={
'mdp': ions_mpd,
'gro': results_solvate['output_gro'],
'top': results_solvate['topol_top'],
},
outputs=['output.tpr'],
metadata={'options': {'redirect_stderr': True}},
)
With the generated .tpr
file in hand, we can now call gmx genion
to insert the ions into the system:
results_genion, node_genion = launch_shell_job(
'gmx',
arguments='genion -s {tpr} -o {gro} -p {top} -pname NA -nname CL -neutral',
nodes={
'gro': results_solvate['output_gro'],
'top': results_solvate['topol_top'],
'tpr': results_grompp_genion['output_tpr'],
'stdin': orm.SinglefileData.from_string('SOL'),
},
outputs=['output.gro', 'topol.top'],
metadata={'options': {'redirect_stderr': True, 'filename_stdin': 'stdin'}},
)
Note
The gmx genion
needs to know which type of residues should be replaced by the ions. Unfortunately, it does not allow
this to be specified with a command line option, but rather will prompt for it, providing a number of options. We want
to select the option SOL
(solvent molecules) for this, but if we run the command normally through aiida-shell
, the
command will hang, waiting for eternity for the prompt to be answered. To forward the prompt answer to aiida-shell
,
we need to communicate it through the stdin
descriptor. We add the desired response, SOL
in this case, as a
SinglefileData
node to the nodes
dictionary. The key that is used, stdin
in this example, should be specified
using the filename_stdin
metadata option.
The system is now properly neutralized and we can proceed to minimizing its energy. This is done using GROMACS’ command
to run molecular dynamics gmx mdrun
. Just like gmx genion
this requires a .tpr
file, which we prepare using
gmx grompp
:
em_mdp = orm.SinglefileData.from_string(
"""
integrator = steep
emtol = 1000.0
emstep = 0.01
nsteps = 50000
nstlist = 1
cutoff-scheme = Verlet
ns_type = grid
coulombtype = PME
rcoulomb = 1.0
rvdw = 1.0
pbc = xyz
ld_seed = 1
gen_seed = 1
""",
filename='em.mdp',
)
# Run `gmx grompp` to pre-process the parameters for energy minimization.
results_grompp_em, node_grompp_em = launch_shell_job(
'gmx',
arguments='grompp -f {mdp} -c {gro} -p {top} -o output.tpr',
nodes={
'mdp': em_mdp,
'gro': results_genion['output_gro'],
'top': results_genion['topol_top'],
},
outputs=['output.tpr'],
metadata={'options': {'redirect_stderr': True}},
)
With the .tpr
created, we run gmx mdrun
to run the energy minimization:
results_em, node_em = launch_shell_job(
'gmx',
arguments='mdrun -v -deffnm output -s {tpr}',
nodes={
'gro': results_genion['output_gro'],
'top': results_genion['topol_top'],
'tpr': results_grompp_em['output_tpr'],
},
outputs=['output.edr'],
metadata={'options': {'redirect_stderr': True}},
)
As the final step in this example, we will performance some analysis on the energy minimization. We use gmx energy
to
extract the system’s potential energy during the simulation:
results_energy, node_energy = launch_shell_job(
'gmx',
arguments='energy -f {edr} -o potential.xvg',
nodes={
'edr': results_em['output_edr'],
'stdin': orm.SinglefileData.from_string('10\n0'),
},
outputs=['potential.xvg'],
metadata={'options': {'redirect_stderr': True, 'filename_stdin': 'stdin'}},
)
Note
The gmx energy
command prompts to determine which quantity to project. In this example, we first pass 10
which
corresponds to the potential energy, followed by 0
which finalizes the selection and completes the prompt.
To visualize the extracted data, we define the create_plot
function (once again decorated with calcfunction
to
preserve provenance) and pass the potential.xvg
output file generated by gmx energy
:
@engine.calcfunction
def create_plot(xvg: orm.SinglefileData) -> orm.SinglefileData:
"""Plot the data of a XVG output file."""
import io
import matplotlib.pyplot as plt
import numpy as np
with xvg.as_path() as filepath:
data = np.loadtxt(filepath, comments=['#', '@']).T
plt.plot(*data)
plt.xlabel('Energy minimization step')
plt.ylabel('Potential energy [kJ/mol]')
stream = io.BytesIO()
plt.savefig(stream, format='png', bbox_inches='tight', dpi=180)
stream.seek(0)
return orm.SinglefileData(stream, filename='potential.png')
# Create a plot from the extracted potential energy of the system
plot = create_plot(results_energy['potential_xvg'])
print(plot.pk)
The plot is saved to a stream in memory which is then passed to a SinglefileData
node to store it in AiiDA’s
provenance graph. The final result will look something like the following:
Tip
To extract the plot from the SinglefileData
node, you can use the command verdi node repo cat <PK> > potential_energy.png
, where you replace <PK>
with the pk of the SinglefileData
node returned by the create_plot
function.
To conclude this example, below is the provenance graph that was created by running the example script: