GROMACS#
GROMACS is a free and open-source software suite for high-performance molecular dynamics and output analysis. In this example, we will follow the example of mdtutorials.com which downloads the Lysozyme protein from the RCSB protein data bank and shows how to use GROMACS to setup a simulation of its dynamics when solvated in a box of water and ions. For brevity’s sake, we will follow the tutorial up until the equilibration stage. The script shown below can be downloaded here.
Warning
This example purely serves to illustrate how to use aiida-shell.
There is no guarantee that GROMACS is used correctly or in the most efficient way.
The first step is to download the protein:
#!/usr/bin/env runaiida
"""Simulation of lysozyme protein dynamics using GROMACS."""
import urllib.request
from aiida import engine, orm
from aiida_shell import launch_shell_job
# Download the lysozyme protein structure.
with urllib.request.urlopen('https://files.rcsb.org/download/1AKI.pdb') as handle:
lysozyme_pdb = orm.SinglefileData(handle, filename='lysozyme.pdb')
We use the urllib module of the standard library to open the file and wrap it in AiiDA’s SinglefileData data type,
which allows it to be stored in the provenance graph. The first step is to perform some simple pre-processing. The
downloaded file defines the protein as embedded in crystalline water, which we don’t need for our purposes and therefore
we remove it. Water molecules are marked by the HOH residue in the file, so we simply write a function that loops over
the lines in the PDB file and removes any line that contains HOH.
@engine.calcfunction
def remove_water_from_pdb(protein: orm.SinglefileData) -> orm.SinglefileData:
"""Remove water molecules from a PDB file."""
lines = protein.get_content().split('\n')
lines_without_water = [line for line in lines if 'HOH' not in line]
return orm.SinglefileData.from_string(
'\n'.join(lines_without_water),
filename=protein.filename,
)
# Remove crystalline water from PDB by skipping any lines that contain `HOH` residue.
lysozyme = remove_water_from_pdb(lysozyme_pdb)
Calling the remote_water_from_pdb with the original lysozyme_pdb returns the modified lysozyme file (which is also
an instance of SinglefileData).
Note
We mark the function remove_water_from_pdb with AiiDA’s calcfunction decorator. This ensures the act of modifying
the originally downloaded protein file is captured in the provenance graph.
Next, we run GROMACS’ gmx pdb2gmx command to convert the file from the .pdb to GROMACS’ native .gro format. If we
were to execute this directly on the command line, it would look like:
gmx pdb2gmx -f lysozyme.pdb -o output.gro -water spce -ff oplsaa
When wrapping this in aiida-shell, the main question is how we pass the protein PDB file to the command, as that is
stored in AiiDA’s provenance graph and is not a file on disk. We simply replace the file name with the {protein}
placeholder and then add the lysozyme SinglefileData nodes in the nodes dictionary. The launch_shell_job will
automatically copy the content of the SinglefileData to the working directory where the command is executed and
substitute the {protein} lysozyme with the filename to which the content was copied:
gmx pdb2gmx to convert the PDB to GROMACS .gro format.#results_pdb2gmx, node_pdb2gmx = launch_shell_job(
'gmx',
arguments='pdb2gmx -f {protein} -o output.gro -water spce -ff oplsaa',
nodes={
'protein': lysozyme,
},
outputs=['output.gro', 'topol.top', 'posre.itp'],
metadata={'options': {'redirect_stderr': True}},
)
Note
The nodes argument takes a dictionary with an arbitrary number of SinglefileData nodes whose content is copied to
the working directory where the command is executed. If the key of the node appears as a placeholder in the arguments
argument, it is replaced with its filename.
The gmx pdb2gmx is expected to generate three outputs of interest:
output.gro: The protein structure in.groformattopol.top: The topology fileposre.itp: The include topology file with position restraints
To capture these output files, they are declared using the outputs argument. aiida-shell will automatically wrap
these output files in a SinglefileData and store them as an output of the command in the provenance graph.
The next step is to create the simulation box around the protein using gmx editconf:
gmx editconf to generate a cubic box around the protein.#results_editconf, node_editconf = launch_shell_job(
'gmx',
arguments='editconf -f {gro} -o output.gro -c -d 1.0 -bt cubic',
nodes={
'gro': results_pdb2gmx['output_gro'],
},
outputs=['output.gro'],
metadata={'options': {'redirect_stderr': True}},
)
Tip
GROMACS by default writes output to the stderr file descriptor. This is a bit unususal, as normally this is reserved
for errors. aiida-shell by default detects if the executed command wrote anything to stderr and if so, assigns the
exit code 400 to the process,
indicating that it has failed. Since the command did not actually fail, this can be misleading. As a workaround, we set
the metadata option redirect_stderr to True which will instruct aiida-shell to redirect all output written to
stderr to the stdout descriptor instead.
The gmx editconf command takes a GRO file defining the protein structure. In this example, this file was created by
the previous gmx pdb2gmx command. We can retrieve this output file from the results dictionary and directly assign it
to the nodes dictionary.
Now that the protein is placed in a simulation box, we can solvate the system in water:
gmx solvate to solvate the protein in water.#results_solvate, node_solvate = launch_shell_job(
'gmx',
arguments='solvate -cp {gro} -cs spc216.gro -o output.gro -p {top}',
nodes={
'gro': results_editconf['output_gro'],
'top': results_pdb2gmx['topol_top'],
},
outputs=['output.gro', 'topol.top'],
metadata={'options': {'redirect_stderr': True}},
)
This command requires, in addition to the .gro file produced in the previous step, the .top topology file created by
the gmx pdb2gmx command.
The next step is to insert ions into the solution to neutralize the system. GROMACS provides the gmx genion command
for this, which runs actual molecular dynamics to optimize the positions of the added ions. It requires a
.tpr file which defines the starting
structure and conditions of a simulation. Such a file is created using the gmx grompp pre-processing utility which
takes an .mdp file defining the
parameters of the simulation:
ions_mpd = orm.SinglefileData.from_string(
"""
integrator = steep
emtol = 1000.0
emstep = 0.01
nsteps = 50000
nstlist = 1
cutoff-scheme = Verlet
ns_type = grid
coulombtype = cutoff
rcoulomb = 1.0
rvdw = 1.0
pbc = xyz
ld_seed = 1
gen_seed = 1
""",
filename='ions.mdp',
)
# Run `gmx grompp` to pre-process the parameters for ion insertion.
results_grompp_genion, node_grompp_ion = launch_shell_job(
'gmx',
arguments='grompp -f {mdp} -c {gro} -p {top} -o output.tpr',
nodes={
'mdp': ions_mpd,
'gro': results_solvate['output_gro'],
'top': results_solvate['topol_top'],
},
outputs=['output.tpr'],
metadata={'options': {'redirect_stderr': True}},
)
With the generated .tpr file in hand, we can now call gmx genion to insert the ions into the system:
gmx genion to neutralize the system with counter ions.#results_genion, node_genion = launch_shell_job(
'gmx',
arguments='genion -s {tpr} -o {gro} -p {top} -pname NA -nname CL -neutral',
nodes={
'gro': results_solvate['output_gro'],
'top': results_solvate['topol_top'],
'tpr': results_grompp_genion['output_tpr'],
'stdin': orm.SinglefileData.from_string('SOL'),
},
outputs=['output.gro', 'topol.top'],
metadata={'options': {'redirect_stderr': True, 'filename_stdin': 'stdin'}},
)
Note
The gmx genion needs to know which type of residues should be replaced by the ions. Unfortunately, it does not allow
this to be specified with a command line option, but rather will prompt for it, providing a number of options. We want
to select the option SOL (solvent molecules) for this, but if we run the command normally through aiida-shell, the
command will hang, waiting for eternity for the prompt to be answered. To forward the prompt answer to aiida-shell,
we need to communicate it through the stdin descriptor. We add the desired response, SOL in this case, as a
SinglefileData node to the nodes dictionary. The key that is used, stdin in this example, should be specified
using the filename_stdin metadata option.
The system is now properly neutralized and we can proceed to minimizing its energy. This is done using GROMACS’ command
to run molecular dynamics gmx mdrun. Just like gmx genion this requires a .tpr file, which we prepare using
gmx grompp:
em_mdp = orm.SinglefileData.from_string(
"""
integrator = steep
emtol = 1000.0
emstep = 0.01
nsteps = 50000
nstlist = 1
cutoff-scheme = Verlet
ns_type = grid
coulombtype = PME
rcoulomb = 1.0
rvdw = 1.0
pbc = xyz
ld_seed = 1
gen_seed = 1
""",
filename='em.mdp',
)
# Run `gmx grompp` to pre-process the parameters for energy minimization.
results_grompp_em, node_grompp_em = launch_shell_job(
'gmx',
arguments='grompp -f {mdp} -c {gro} -p {top} -o output.tpr',
nodes={
'mdp': em_mdp,
'gro': results_genion['output_gro'],
'top': results_genion['topol_top'],
},
outputs=['output.tpr'],
metadata={'options': {'redirect_stderr': True}},
)
With the .tpr created, we run gmx mdrun to run the energy minimization:
gmx mdrun to run the energy minimization.#results_em, node_em = launch_shell_job(
'gmx',
arguments='mdrun -v -deffnm output -s {tpr}',
nodes={
'gro': results_genion['output_gro'],
'top': results_genion['topol_top'],
'tpr': results_grompp_em['output_tpr'],
},
outputs=['output.edr'],
metadata={'options': {'redirect_stderr': True}},
)
As the final step in this example, we will performance some analysis on the energy minimization. We use gmx energy to
extract the system’s potential energy during the simulation:
gmx energy to extract the potential energy during the energy minimization.#results_energy, node_energy = launch_shell_job(
'gmx',
arguments='energy -f {edr} -o potential.xvg',
nodes={
'edr': results_em['output_edr'],
'stdin': orm.SinglefileData.from_string('10\n0'),
},
outputs=['potential.xvg'],
metadata={'options': {'redirect_stderr': True, 'filename_stdin': 'stdin'}},
)
Note
The gmx energy command prompts to determine which quantity to project. In this example, we first pass 10 which
corresponds to the potential energy, followed by 0 which finalizes the selection and completes the prompt.
To visualize the extracted data, we define the create_plot function (once again decorated with calcfunction to
preserve provenance) and pass the potential.xvg output file generated by gmx energy:
@engine.calcfunction
def create_plot(xvg: orm.SinglefileData) -> orm.SinglefileData:
"""Plot the data of a XVG output file."""
import io
import matplotlib.pyplot as plt
import numpy as np
with xvg.as_path() as filepath:
data = np.loadtxt(filepath, comments=['#', '@']).T
plt.plot(*data)
plt.xlabel('Energy minimization step')
plt.ylabel('Potential energy [kJ/mol]')
stream = io.BytesIO()
plt.savefig(stream, format='png', bbox_inches='tight', dpi=180)
stream.seek(0)
return orm.SinglefileData(stream, filename='potential.png')
# Create a plot from the extracted potential energy of the system
plot = create_plot(results_energy['potential_xvg'])
print(plot.pk)
The plot is saved to a stream in memory which is then passed to a SinglefileData node to store it in AiiDA’s
provenance graph. The final result will look something like the following:
The evolution of the potential energy of the system during the energy minimization.#
Tip
To extract the plot from the SinglefileData node, you can use the command verdi node repo cat <PK> > potential_energy.png, where you replace <PK> with the pk of the SinglefileData node returned by the create_plot
function.
To conclude this example, below is the provenance graph that was created by running the example script:
The provenance grpah was generated using verdi node graph generate <PK>, where <PK> was replaced with the pk of the
SinglefileData node returned by the create_plot function.#