GROMACS#

GROMACS is a free and open-source software suite for high-performance molecular dynamics and output analysis. In this example, we will follow the example of mdtutorials.com which downloads the Lysozyme protein from the RCSB protein data bank and shows how to use GROMACS to setup a simulation of its dynamics when solvated in a box of water and ions. For brevity’s sake, we will follow the tutorial up until the equilibration stage. The script shown below can be downloaded here.

Warning

This example purely serves to illustrate how to use aiida-shell. There is no guarantee that GROMACS is used correctly or in the most efficient way.

The first step is to download the protein:

#!/usr/bin/env runaiida
"""Simulation of lysozyme protein dynamics using GROMACS."""
import urllib.request

from aiida import engine, orm
from aiida_shell import launch_shell_job

# Download the lysozyme protein structure.
with urllib.request.urlopen('https://files.rcsb.org/download/1AKI.pdb') as handle:
    lysozyme_pdb = orm.SinglefileData(handle, filename='lysozyme.pdb')

We use the urllib module of the standard library to open the file and wrap it in AiiDA’s SinglefileData data type, which allows it to be stored in the provenance graph. The first step is to perform some simple pre-processing. The downloaded file defines the protein as embedded in crystalline water, which we don’t need for our purposes and therefore we remove it. Water molecules are marked by the HOH residue in the file, so we simply write a function that loops over the lines in the PDB file and removes any line that contains HOH.

@engine.calcfunction
def remove_water_from_pdb(protein: orm.SinglefileData) -> orm.SinglefileData:
    """Remove water molecules from a PDB file."""
    lines = protein.get_content().split('\n')
    lines_without_water = [line for line in lines if 'HOH' not in line]
    return orm.SinglefileData.from_string(
        '\n'.join(lines_without_water),
        filename=protein.filename,
    )


# Remove crystalline water from PDB by skipping any lines that contain `HOH` residue.
lysozyme = remove_water_from_pdb(lysozyme_pdb)

Calling the remote_water_from_pdb with the original lysozyme_pdb returns the modified lysozyme file (which is also an instance of SinglefileData).

Note

We mark the function remove_water_from_pdb with AiiDA’s calcfunction decorator. This ensures the act of modifying the originally downloaded protein file is captured in the provenance graph.

Next, we run GROMACS’ gmx pdb2gmx command to convert the file from the .pdb to GROMACS’ native .gro format. If we were to execute this directly on the command line, it would look like:

gmx pdb2gmx -f lysozyme.pdb -o output.gro -water spce -ff oplsaa

When wrapping this in aiida-shell, the main question is how we pass the protein PDB file to the command, as that is stored in AiiDA’s provenance graph and is not a file on disk. We simply replace the file name with the {protein} placeholder and then add the lysozyme SinglefileData nodes in the nodes dictionary. The launch_shell_job will automatically copy the content of the SinglefileData to the working directory where the command is executed and substitute the {protein} lysozyme with the filename to which the content was copied:

Run gmx pdb2gmx to convert the PDB to GROMACS .gro format.#

results_pdb2gmx, node_pdb2gmx = launch_shell_job(
    'gmx',
    arguments='pdb2gmx -f {protein} -o output.gro -water spce -ff oplsaa',
    nodes={
        'protein': lysozyme,
    },
    outputs=['output.gro', 'topol.top', 'posre.itp'],
    metadata={'options': {'redirect_stderr': True}},
)

Note

The nodes argument takes a dictionary with an arbitrary number of SinglefileData nodes whose content is copied to the working directory where the command is executed. If the key of the node appears as a placeholder in the arguments argument, it is replaced with its filename.

The gmx pdb2gmx is expected to generate three outputs of interest:

output.gro: The protein structure in .gro format
topol.top: The topology file
posre.itp: The include topology file with position restraints

To capture these output files, they are declared using the outputs argument. aiida-shell will automatically wrap these output files in a SinglefileData and store them as an output of the command in the provenance graph.

The next step is to create the simulation box around the protein using gmx editconf:

Run gmx editconf to generate a cubic box around the protein.#

results_editconf, node_editconf = launch_shell_job(
    'gmx',
    arguments='editconf -f {gro} -o output.gro -c -d 1.0 -bt cubic',
    nodes={
        'gro': results_pdb2gmx['output_gro'],
    },
    outputs=['output.gro'],
    metadata={'options': {'redirect_stderr': True}},
)

Tip

GROMACS by default writes output to the stderr file descriptor. This is a bit unususal, as normally this is reserved for errors. aiida-shell by default detects if the executed command wrote anything to stderr and if so, assigns the exit code 400 to the process, indicating that it has failed. Since the command did not actually fail, this can be misleading. As a workaround, we set the metadata option redirect_stderr to True which will instruct aiida-shell to redirect all output written to stderr to the stdout descriptor instead.

The gmx editconf command takes a GRO file defining the protein structure. In this example, this file was created by the previous gmx pdb2gmx command. We can retrieve this output file from the results dictionary and directly assign it to the nodes dictionary.

Now that the protein is placed in a simulation box, we can solvate the system in water:

Run gmx solvate to solvate the protein in water.#

results_solvate, node_solvate = launch_shell_job(
    'gmx',
    arguments='solvate -cp {gro} -cs spc216.gro -o output.gro -p {top}',
    nodes={
        'gro': results_editconf['output_gro'],
        'top': results_pdb2gmx['topol_top'],
    },
    outputs=['output.gro', 'topol.top'],
    metadata={'options': {'redirect_stderr': True}},
)

This command requires, in addition to the .gro file produced in the previous step, the .top topology file created by the gmx pdb2gmx command.

The next step is to insert ions into the solution to neutralize the system. GROMACS provides the gmx genion command for this, which runs actual molecular dynamics to optimize the positions of the added ions. It requires a .tpr file which defines the starting structure and conditions of a simulation. Such a file is created using the gmx grompp pre-processing utility which takes an .mdp file defining the parameters of the simulation:

Define the input parameters for ion insertion.#

ions_mpd = orm.SinglefileData.from_string(
    """
    integrator      = steep
    emtol           = 1000.0
    emstep          = 0.01
    nsteps          = 50000
    nstlist         = 1
    cutoff-scheme   = Verlet
    ns_type         = grid
    coulombtype     = cutoff
    rcoulomb        = 1.0
    rvdw            = 1.0
    pbc             = xyz
    ld_seed         = 1
    gen_seed        = 1
    """,
    filename='ions.mdp',
)

# Run `gmx grompp` to pre-process the parameters for ion insertion.
results_grompp_genion, node_grompp_ion = launch_shell_job(
    'gmx',
    arguments='grompp -f {mdp} -c {gro} -p {top} -o output.tpr',
    nodes={
        'mdp': ions_mpd,
        'gro': results_solvate['output_gro'],
        'top': results_solvate['topol_top'],
    },
    outputs=['output.tpr'],
    metadata={'options': {'redirect_stderr': True}},
)

With the generated .tpr file in hand, we can now call gmx genion to insert the ions into the system:

Run gmx genion to neutralize the system with counter ions.#

results_genion, node_genion = launch_shell_job(
    'gmx',
    arguments='genion -s {tpr} -o {gro} -p {top} -pname NA -nname CL -neutral',
    nodes={
        'gro': results_solvate['output_gro'],
        'top': results_solvate['topol_top'],
        'tpr': results_grompp_genion['output_tpr'],
        'stdin': orm.SinglefileData.from_string('SOL'),
    },
    outputs=['output.gro', 'topol.top'],
    metadata={'options': {'redirect_stderr': True, 'filename_stdin': 'stdin'}},
)

Note

The gmx genion needs to know which type of residues should be replaced by the ions. Unfortunately, it does not allow this to be specified with a command line option, but rather will prompt for it, providing a number of options. We want to select the option SOL (solvent molecules) for this, but if we run the command normally through aiida-shell, the command will hang, waiting for eternity for the prompt to be answered. To forward the prompt answer to aiida-shell, we need to communicate it through the stdin descriptor. We add the desired response, SOL in this case, as a SinglefileData node to the nodes dictionary. The key that is used, stdin in this example, should be specified using the filename_stdin metadata option.

The system is now properly neutralized and we can proceed to minimizing its energy. This is done using GROMACS’ command to run molecular dynamics gmx mdrun. Just like gmx genion this requires a .tpr file, which we prepare using gmx grompp:

Define the input parameters for energy minimization.#

em_mdp = orm.SinglefileData.from_string(
    """
    integrator      = steep
    emtol           = 1000.0
    emstep          = 0.01
    nsteps          = 50000
    nstlist         = 1
    cutoff-scheme   = Verlet
    ns_type         = grid
    coulombtype     = PME
    rcoulomb        = 1.0
    rvdw            = 1.0
    pbc             = xyz
    ld_seed         = 1
    gen_seed        = 1
    """,
    filename='em.mdp',
)

# Run `gmx grompp` to pre-process the parameters for energy minimization.
results_grompp_em, node_grompp_em = launch_shell_job(
    'gmx',
    arguments='grompp -f {mdp} -c {gro} -p {top} -o output.tpr',
    nodes={
        'mdp': em_mdp,
        'gro': results_genion['output_gro'],
        'top': results_genion['topol_top'],
    },
    outputs=['output.tpr'],
    metadata={'options': {'redirect_stderr': True}},
)

With the .tpr created, we run gmx mdrun to run the energy minimization:

Run gmx mdrun to run the energy minimization.#

results_em, node_em = launch_shell_job(
    'gmx',
    arguments='mdrun -v -deffnm output -s {tpr}',
    nodes={
        'gro': results_genion['output_gro'],
        'top': results_genion['topol_top'],
        'tpr': results_grompp_em['output_tpr'],
    },
    outputs=['output.edr'],
    metadata={'options': {'redirect_stderr': True}},
)

As the final step in this example, we will performance some analysis on the energy minimization. We use gmx energy to extract the system’s potential energy during the simulation:

Run gmx energy to extract the potential energy during the energy minimization.#

results_energy, node_energy = launch_shell_job(
    'gmx',
    arguments='energy -f {edr} -o potential.xvg',
    nodes={
        'edr': results_em['output_edr'],
        'stdin': orm.SinglefileData.from_string('10\n0'),
    },
    outputs=['potential.xvg'],
    metadata={'options': {'redirect_stderr': True, 'filename_stdin': 'stdin'}},
)

Note

The gmx energy command prompts to determine which quantity to project. In this example, we first pass 10 which corresponds to the potential energy, followed by 0 which finalizes the selection and completes the prompt.

To visualize the extracted data, we define the create_plot function (once again decorated with calcfunction to preserve provenance) and pass the potential.xvg output file generated by gmx energy:

@engine.calcfunction
def create_plot(xvg: orm.SinglefileData) -> orm.SinglefileData:
    """Plot the data of a XVG output file."""
    import io

    import matplotlib.pyplot as plt
    import numpy as np

    with xvg.as_path() as filepath:
        data = np.loadtxt(filepath, comments=['#', '@']).T

    plt.plot(*data)
    plt.xlabel('Energy minimization step')
    plt.ylabel('Potential energy [kJ/mol]')

    stream = io.BytesIO()
    plt.savefig(stream, format='png', bbox_inches='tight', dpi=180)
    stream.seek(0)

    return orm.SinglefileData(stream, filename='potential.png')


# Create a plot from the extracted potential energy of the system
plot = create_plot(results_energy['potential_xvg'])
print(plot.pk)

The plot is saved to a stream in memory which is then passed to a SinglefileData node to store it in AiiDA’s provenance graph. The final result will look something like the following:

Tip

To extract the plot from the SinglefileData node, you can use the command verdi node repo cat <PK> > potential_energy.png, where you replace <PK> with the pk of the SinglefileData node returned by the create_plot function.

To conclude this example, below is the provenance graph that was created by running the example script: