Getting Started

This page details how to perform free energy calculations between a molecular mechanics force field (in this case the open-forcefield is used) and a neural network potential (ANI-2x).

Installation

We recommend setting up a new python conda environment with python=3.9 and installing the packages defined here using mamba. This package can be installed using: pip install git+https://github.com/wiederm/endstate_correction.git.

How to use this package

We have prepared two scripts that should help to use this package, both are located in endstate_correction/scripts. We will start by describing the use of the generate_endstate_samples.py script and then discuss the perform_correction.py script.

A typical Non-equilibrium (NEQ) switching workflow

Generate the equilibrium distribution \(\pi(x, \lambda=0)\)

In order to perform a NEQ work protocol, we need samples drawn from the equilibrium distribution from which we initialize our annealing simulations. If samples are not already available, the generate_endstate_samples.py script provides and easy way to obtain these samples.

In the following we will use a molecule from the HiPen dataset with the SMILES string “”Cn1cc(Cl)c(/C=N/O)n1” as a test system.

The scripts starts by defining some control parameters such as n_samples (number of samples that should be generated), n_steps_per_sample (integration steps between samples), base (where to store the samples).

We then start setting the simulation object

forcefield = ForceField("openff_unconstrained-2.0.0.offxml")
env = "vacuum"

potential = MLPotential("ani2x")
molecule = Molecule.from_smiles(smiles, hydrogens_are_explicit=False) # generate molecule from smile
molecule.generate_conformers(n_conformers=1) # generate a single conforamtion

topology = molecule.to_topology()
system = forcefield.create_openmm_system(topology)
# define region that should be treated with the nnp
ml_atoms = [atom.molecule_particle_index for atom in topology.atoms]
# set up the integrator and the platform (here we are assuming you have a CUDA GPU)
integrator = LangevinIntegrator(temperature, collision_rate, stepsize)
platform = Platform.getPlatformByName("CUDA")
# we are creating a mixed system using openmm-ml
topology = topology.to_openmm()
ml_system = potential.createMixedSystem(topology, system, ml_atoms, interpolate=True)
sim = Simulation(topology, ml_system, integrator, platform=platform)

Note that we explicitly define the atoms that should be perturbed from the reference to the target potential using

ml_atoms = [atom.molecule_particle_index for atom in topology.atoms]

If you want to perform bidirectional FEP or NEQ you need to sample at \(\pi(x, \lambda=0)\) and \(\pi(x, \lambda=1)\). This can be controlled by setting the number using the variable nr_lambda_states. The default value set in the generate_endstate_samples.py script is nr_lambda_states=2, generating samples from both endstate equilibrium distributions.

nr_lambda_states = 2  # samples equilibrium distribution at both endstates
lambs = np.linspace(0, 1, nr_lambda_states)

Perform unidirectional NEQ from \(\pi(x, \lambda=0)\)

The endstate correction can be performed using the script perform_correction.py. The script will calculate the free energy estimate using the samples generated with the generate_endstate_samples.py script. Subsequently, the relevant sections of the perform_correction.py script are explained — but they should work for for the testsystem without any modifications.

To perform a specific endstate correction we need to define a protocol (some standard protocols are shown here) with:

neq_protocol = NEQProtocol(
    sim=sim,
    reference_samples=mm_samples,
    nr_of_switches=100,
    neq_switching_length=1_000,
    save_endstates=False,
    save_trajs=False
)

This protocol is then passed to the actual function performing the protocol: perform_endstate_correction(neq_protocol). The particular code above defines unidirectional NEQ switching using 100 switches and a switching length of 1 ps. The direciton of the switching simulation is controlled by the sampels that are provided: if reference_samples are provided, switching is performed from the reference to the target level of theory, if target_samples are provided, switching is performed from the target level to the reference level. If both samples are provided, bidirectional NEQ switching is performed (for an example see below). There is the possibility to save the endstate of each switch if save_endstates is set to True. Additionally, the switching trajectory of each switch can be saved if save_trajs=True. Both options only make sense for a NEQ protocol.

Perform bidirectional NEQ from \(\pi(x, \lambda=0)\) and \(\pi(x, \lambda=1)\)

The endstate correction can be performed using the script perform_correction.py and the following protocol.

neq_protocol = NEQProtocol(
    sim=sim,
    reference_samples=mm_samples,
    target_samples=nnp_samples,
    nr_of_switches=100,
    neq_switching_length=1_000,
)

This protocol is then passed to the actual function performing the protocol: perform_endstate_correction(neq_protocol).

Perform unidirectional FEP from \(\pi(x, \lambda=0)\)

The protocol has to be adopted slightly:

fep_protocol = FEPProtocol(
    sim=sim,
    reference_samples=mm_samples,
    nr_of_switches=1_000,
)

This protocol is then passed to the actual function performing the protocol: perform_endstate_correction(fep_protocol).

Perform unidirectional SMC from \(\pi(x, \lambda=0)\)

To perform a unidirectional Sequential Monte-Carlo (SMC) protocol, the following adaptations have to be made:

smc_protocol = SMCProtocol(
    sim=sim,
    reference_samples=mm_samples,
    nr_of_walkers=100,
    protocol_length=1_000
    nr_of_resampling_steps=100,
    save_endstates=True
)

Here, nr_of_walkers indicates the number of walkers selected from the reference distribution, which are propagated for 1ps (protocol_length). nr_of nr_of_resampling_steps states that the walkers are resampled 100 times. With the option save_endstates samples at the end of the protocol can be saved.

Perform multiple protocols

It is possible to combine several protocols into one with AllProtocol.

protocol = AllProtocol(
    fep_protocol=fep_protocol,
    neq_protocol=neq_protocol,
    smc_protocol=smc_protocol
)

This protocol is then passed to perform_endstate_correction() to perform all endstate corrections.

Analyse results of an unidirection NEQ protocol

To analyse the results generated by r = perform_endstate_correction() pass the return value to plot_endstate_correction_results(system_name, r, "results_neq_unidirectional.png") and results will be plotted and printed. It should be noted, that perform_endstate_correction() returns an instance of AllResults(), which, depending on the passed protocol, contains results of an FEP, NEQ switching and SMC protocol (instances of the FEPResults(), NEQResults() and SMCResults() class, respectively). It is also possible to add equilibrium sampling results to the instance of AllResults() retrospectively (EquResults()).

Available protocols

unidirectional NEQ protocol (reference to target)

neq_protocol = NEQProtocol(
    sim=sim,
    reference_samples=mm_samples,
    nr_of_switches=100,
    neq_switching_length=1_000,
)

bidirectional NEQ protocol (save switching trajectory and endstate of each switch)

neq_protocol = NEQProtocol(
    sim=sim,
    reference_samples=mm_samples,
    target_samples=nnp_samples,
    nr_of_switches=100,
    neq_switching_length=1_000,
    save_endstates=True,
    save_trajs=True,
)

unidirectional FEP protocol (reference to target)

fep_protocol = FEPProtocol(
    sim=sim,
    reference_samples=mm_samples,
    nr_of_switches=1_000,
)

bidirectional FEP protocol

fep_protocol = FEPProtocol(
    sim=sim,
    reference_samples=mm_samples,
    target_samples=nnp_samples,
    nr_of_switches=1_000,
)

unidirectional SMC protocol (reference to target)

smc_protocol = SMCProtocol(
    sim=sim,
    reference_samples=mm_samples,
    nr_of_walkers=100,
    nr_of_resampling_steps=1_000,
)