{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Minimizing the relative entropy\n",
    "\n",
    "This example will determine the interaction parameters of a hard-sphere-like mixture that interacts through the Weeks-Chandler-Andersen (WCA) potential:\n",
    "\n",
    "$$\n",
    "    u_{ij}(r) = 4 \\varepsilon \\left[ \\left(\\frac{\\sigma_{ij}}{r} \\right)^{12} - \\left(\\frac{\\sigma_{ij}}{r}\\right)^6 + \\frac{1}{4} \\right], \\quad r \\le 2^{1/6} \\sigma_{ij}.\n",
    "$$\n",
    "\n",
    "The WCA potential is the purely repulsive half of the Lennard-Jones potential, truncated and shifted at its minimum. Here, we will fix $\\varepsilon = 1$ as the energy for all interactions, while we will try to adjust the size parameters $\\sigma_{ij}$ between two particles of types $i$ and $j$. There are 3 parameters $\\sigma_{11}$, $\\sigma_{12}$, and $\\sigma_{22}$.\n",
    "\n",
    "Start off by importing `relentless`!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import relentless"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Model\n",
    "\n",
    "We will work with a mixture that has two components 1 & 2 in the \"dilute\" limit of statistical mechanics, where the pair correlation function,\n",
    "\n",
    "$$\n",
    "    g_{ij}(r) = e^{-\\beta u_{ij}(r)},\n",
    "$$\n",
    "\n",
    "is known exactly. Here, $\\beta = 1/(k_{\\rm B} T)$, $k_{\\rm B}$ is the Boltzmann constant, and $T$ is the temperature."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Target ensemble\n",
    "\n",
    "Let's define a `target` thermodynamic state containing 10 particles of component 1 ($N_1 = 10$) and 5 particles of component 2 ($N_2 = 5$) in a cubic box with edge length $L = 10$. We will also set the nominal temperature as $T = 1.0$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "target = relentless.model.Ensemble(\n",
    "    T=1.0, V=relentless.model.extent.Cube(L=10.0), N={\"1\": 10, \"2\": 5}\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " The goal of relative entropy minimization is to determine interaction parameters that best reproduce a target ensemble, specifically the molecular structures. These are specified using the pair correlation functions. We have precomputed the $g_{11}$, $g_{12}$, and $g_{22}$ we are targeting for the mixture, and we have saved them to files `rdf_11.dat`, `rdf_12.dat`, and `rdf_22.dat`. You can download them [here](https://github.com/mphowardlab/relentless/tree/main/doc/source/guide/examples/binary_hard_spheres)! The first column are the values of $r$, and the second column are the values of $g_{ij}$. We load these and specify them as the `RDF` (radial distribution function) for the `target` ensemble:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "for i, j in target.rdf:\n",
    "    rdf = relentless.mpi.world.loadtxt(f\"rdf_{i}{j}.dat\")\n",
    "    target.rdf[i, j] = relentless.model.ensemble.RDF(rdf[:, 0], rdf[:, 1])"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that here we are using the `pairs` of types in the ensemble to programmatically loop through the files and set all of the RDFs. You could also do this manually if you like (e.g., `target.rdf[\"1\", \"1\"]`), but that could get a bit tedious!\n",
    "\n",
    "We also note that we are using the `loadtxt` method supplied by `relentless` for compatibility with MPI. There is a wrapper around `numpy.loadtxt` and there is very little overhead to using this even if you aren't using MPI support, so we recommend you get in the habit of using it in case you want to port your script to a supercomputer."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Design variables\n",
    "\n",
    "Next we are going to define the variables of our model that we want to parametrize. These are called `IndependentVariables` in `relentless`. They must start from an initial value, and they can have physical bounds.\n",
    "\n",
    "We know that for two hard spheres having diameters $d_i$ and $d_j$, we typically choose $\\sigma_{ij} = (d_i + d_j)/2$ to prevent overlap in the WCA potential. (Note that the WCA potential does not generate a true hard sphere, but this is actually better for relative entropy optimization because the WCA potential is differentiable.)\n",
    "\n",
    "Hence, our model really has two parameters $d_1$ and $d_2$, then we can specify $\\sigma_{11} = d_1$, $\\sigma_{22} = d_2$, and $\\sigma_{12} = (d_1+d_2)/2$. We make two design variables $d_1$ and $d_2$ and constrain them to be nonnegative and less than 5. We start them at slightly different values (this is just a guess!):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "d = {\n",
    "    \"1\": relentless.model.variable.IndependentVariable(\n",
    "        value=1.5, name=\"d_1\", low=0, high=5),\n",
    "    \"2\": relentless.model.variable.IndependentVariable(\n",
    "        value=2.5, name=\"d_2\", low=0, high=5)\n",
    "}"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, we use these variables to parametrize the WCA potential. To do this, we create a `LennardJones` potential, which has the same functional form, with 2 types then specify self and cross pair coefficients according to the WCA form:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "wca = relentless.model.potential.LennardJones(types=(\"1\", \"2\"))\n",
    "# \"self\" interactions\n",
    "wca.coeff[\"1\", \"1\"].update(\n",
    "    {\n",
    "        \"epsilon\": 1.0,\n",
    "        \"sigma\": d[\"1\"],\n",
    "        \"rmax\": 2.**(1./6.) * d[\"1\"],\n",
    "        \"shift\": True,\n",
    "    }\n",
    ")\n",
    "wca.coeff[\"2\", \"2\"].update(\n",
    "    {\n",
    "        \"epsilon\": 1.0,\n",
    "        \"sigma\": d[\"2\"],\n",
    "        \"rmax\": 2.**(1./6.) * d[\"2\"],\n",
    "        \"shift\": True,\n",
    "    }\n",
    ")\n",
    "\n",
    "# \"cross\" interaction\n",
    "sigma_12 = (d[\"1\"]+d[\"2\"])/2\n",
    "wca.coeff[\"1\", \"2\"].update(\n",
    "    {\n",
    "        \"epsilon\": 1.0,\n",
    "        \"sigma\": sigma_12,\n",
    "        \"rmax\": 2.**(1./6.) * sigma_12,\n",
    "        \"shift\": True,\n",
    "    }\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that the pair coefficient matrix is symmetric, so we only have to specify the `(\"1\", \"2\")` pair and not also the `(\"2\", \"1\")` pair. In the process of doing this, we also performed basic arithmetic on `d_1` and `d_2` to get `sigma_12` and set all of the `rmax` parameters. These are called `DependentVariables` in relentless, and they will be automatically handled by the code for simple operations."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Simulation\n",
    "\n",
    "Now that the model is specified, we can run a simulation. As in previous examples, we first set up the simulation operations:\n",
    "\n",
    "1. Initialize the particles randomly.\n",
    "2. Run a short equilibration period with Langevin dynamics.\n",
    "3. Run a production period with Langevin dynamics. We attach an ensemble analyzer that will be used in the relative entropy minimization later. The thermodynamic properties and RDF will be computed at regular intervals, and these intervals should be chosen based on what would be appropriate for sampling in a simulation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "init = relentless.simulate.InitializeRandomly(\n",
    "    seed=42,\n",
    "    T=target.T,\n",
    "    V=target.V,\n",
    "    N=target.N,\n",
    "    diameters=d,\n",
    ")\n",
    "\n",
    "eq = relentless.simulate.RunLangevinDynamics(\n",
    "    steps=2e4,\n",
    "    timestep=0.005,\n",
    "    T=target.T,\n",
    "    friction=0.1,\n",
    "    seed=2,\n",
    ")\n",
    "\n",
    "ens_avg = relentless.simulate.EnsembleAverage(\n",
    "    filename=None,\n",
    "    every=20,\n",
    "    rdf={\"every\": 2000, \"stop\": 6., \"num\": 120},\n",
    ")\n",
    "\n",
    "prod = relentless.simulate.RunLangevinDynamics(\n",
    "    steps=2e5,\n",
    "    timestep=0.005,\n",
    "    T=target.T,\n",
    "    friction=0.1,\n",
    "    seed=3,\n",
    "    analyzers=ens_avg,\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will use the `Dilute` simulator to run these operations, which does not really run most of them since it can straightforwardly compute its own ensemble. Hence, it can be used as a quick test of a script or to determine optimization parameters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "sim = relentless.simulate.Dilute(init, operations=[eq, prod])"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We also create the potential tabulator for the simulation. We make sure that the stopping point is large enough to accommodate the largest possible cutoff that the WCA potential can adopt."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "pot = relentless.simulate.Potentials()\n",
    "pot.pair = relentless.simulate.PairPotentialTabulator(\n",
    "    wca, start=1e-6, stop=2.**(1./6.) * 5., num=1000, neighbor_buffer=0.5\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Relative entropy optimization\n",
    "\n",
    "Now that the model and simulation are specified, we are ready to optimize the relative entropy! First, we create the relative entropy objective function. It takes four arguments: the target ensemble, the simulation, the potentials to simulate, and the analysis operation that returns an ensemble average:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "s_rel = relentless.optimize.RelativeEntropy(target, sim, pot, ens_avg)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can optimize the relative entropy with respect to the parameters of our model. We will use standard steepest descent here, and we will stop the optimization once the norm of the gradient is less than a threshold. This type of optimization is not necessarily our first choice (we often prefer the fixed-step size variant of steepest descent with line search!), but it is simple to setup for illustrative purposes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Converged? True\n",
      "d_1 = 1.0, d_2 = 3.0\n"
     ]
    }
   ],
   "source": [
    "vars = [d[\"1\"], d[\"2\"]]\n",
    "optimizer = relentless.optimize.FixedStepDescent(\n",
    "    stop=relentless.optimize.GradientTest(1e-4, vars),\n",
    "    max_iter=100,\n",
    "    step_size=1e-2,\n",
    ")\n",
    "converged = optimizer.optimize(s_rel, vars)\n",
    "print(f\"Converged? {converged}\")\n",
    "print(\"d_1 = {:.1f}, d_2 = {:.1f}\".format(d[\"1\"].value, d[\"2\"].value))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Depending on your problem, you may need to play around with the various parameters until you get a satisfactory answer. Here, the output of `optimize` shows that the process worked. The optimized values of the design variables are $d_1 = 1.0$ and $d_2 = 3.0$. These match the ones we used to create the RDFs!"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "relentless",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.0"
  },
  "vscode": {
   "interpreter": {
    "hash": "2febe0fb9ed24084c5c332c6c413685d57788be390aba416bfd9a056c817d298"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}