PBT Workflow
============

PBT is an asynchronous optimization algorithm for jointly optimizing a
population of models and their hyperparameters while effectively using a
fixed computational budget. Like a simple parallel grid search, PBT
begins by randomly sampling selected hyperparameters and initial weights
and training multiple models in parallel using these hyperparameters and
weights. However, unlike a parallel search, each training run
periodically and asynchronously runs an *evaluate* method when a model
is considered *ready*, comparing its performance against that of other
models. If it is under-performing, PBT uses two additional methods to
improve performance: *exploit* and *explore*. Exploit leverages the work
of the population as a whole by replacing an underperforming model with
a better one, i.e., by replacing a model’s current weights with those of
the better performing model. Explore attempts to find new better
performing hyperparameters by perturbing those of the better performing
model. Training then continues with the new weights and the new
hyperparameters. Evaluate, exploit, and explore are performed
asynchronously and independently by each model for some specified number
of steps. In this way the hyperparameters are optimized online and
computational resources are focused on better performing hyperparameters
and weights, quickly discarding unpromising solutions.

This PBT example is written in Python using the MPI for Python (mpi4py)
package. It consists of model agnostic framework code for creating PBT
workflows (``python/pbt.py``) and an example workflow
(``python/tc1_pbt.py``). This example workflow trains a variant of our
tc1 benchmark (``models/tc1``). In this example, a tc1 model run is
considered underperforming if its validation loss is in the lower 20% of
the population, at which time it will perform an exploit and explore.
During exploit a model loads the weights of a model randomly selected
from the top 20%. (Loading and storing of weights is file-based, where
weights are serialized every epoch and then loaded as necessary.) During
the explore, a model perturbs the learning rate of the selected better
performing model, and then continues training with the new weights and
learning rate.

Requirements
------------

-  This workflow: git@github.com:ECP-CANDLE/Supervisor.git. Clone and cd
   to workflows/pbt (the directory containing this README).

-  Python: the PBT workflow has been tested under Python 2.7.

-  MPI for Python (mpi4py): http://mpi4py.scipy.org/docs/

-  Keras: https://keras.io

-  CANDLE Benchmark Code: git@github.com:ECP-CANDLE/Benchmarks.git.
   Clone and switch to the frameworks branch.

-  TC1 benchmark data:
   ``ftp://ftp.mcs.anl.gov/pub/candle/public/benchmarks/Pilot1/type-class/type_18_300_test.csv ftp://ftp.mcs.anl.gov/pub/candle/public/benchmarks/Pilot1/type-class/type_18_300_train.csv``

   ``type_18_300_train.csv`` and ``type_18_300_test.csv`` should be
   copied into ``X/Benchmarks/Data/Pilot1``, where X is wherever you
   cloned the Benchmark repository. For example, from within
   X/Benchmarks

   ::

      mkdir -p Data/Pilot1
      cd Data/Pilot1
      wget ftp://ftp.mcs.anl.gov/pub/candle/public/benchmarks/Pilot1/type-class/type_18_300_test.csv
      wget ftp://ftp.mcs.anl.gov/pub/candle/public/benchmarks/Pilot1/type-class/type_18_300_train.csv

Running the Workflow
--------------------

The PBT workflow is an MPI application that when given N number of
processes, runs N - 1 tc1 models, and uses the remaining process to run
a datastore into which the models can put and get model peformance data.
The workflow can be run using the scripts in the ``scripts`` directory.
Two scripts are provided: ``local_run_pbt.sh`` and
``sbatch_run_pbt.sh``. The former can be used to run on a local desktop
or laptop. The latter can be used to submit the PBT workflow on hpc
resources that use the slurm scheduler. In either case, main application
file is ``python/tc1_pbt.py``.

When run the PBT workflow will create an experiments directory in which
the output will be written. The output consists of a ``weights``
directory into which each tc1 instance writes is model weights every
epoch, and an output.csv file that records the accuracy, loss, learning
rate, validation accuracy, and validation loss for each model
(identified by MPI rank) each epoch. Additionally each tc1 model run
will execute within its own ``run_N`` instance directory (e.g.
``run_1``, ``run_2`` and so forth) within the output directory.

local_run_pbt.sh
~~~~~~~~~~~~~~~~

``local_run_pbt.sh`` takes 3 arguments

1. The number of processes to use
2. An experiment id
3. The path to a pbt parameter file (see below) that defines the tc1
   hyperparameters

The experiment id is used to as the name of the experiments directory
into which the model output will be written as mentioned above. For
example, given the location of the ``scripts`` directory as
``workflows/pbt/scripts`` and an experiment id of ``r1``, the
experiments directory will be ``workflows/pbt/experiments/r1``.

sbatch_run_pbt.sh
~~~~~~~~~~~~~~~~~

``sbatch_run_pbt.sh`` takes 2 arguments:

1. An experiment id
2. The path to a pbt parameter file (see below) that defines the tc1
   hyperparameters

The experiment id is again used as the name of the experiments directory
into which the model output will be written, as mentioned above. For
example, given the location of the ``scripts`` directory as
``workflows/pbt/scripts`` and an experiment id of ``r1``, the
experiments directory will be ``workflows/pbt/experiments/r1``.

``sbatch_run_pbt.sh`` ultimately calls ``sbatch`` to submit the job
defined in ``scripts/pbt.sbatch``. That file can be copied and edited as
appropriate, setting the queue, walltime, python, etc. for your HPC
machine. It is currently configured for NERSC’s Cori system.

Hyperparameter Configuration File
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The PBT workflow uses a json format file for defining the hyperparameter
space used by the PBT algorithm. The PBT workflow includes 2 sample
hyperparameter configuration files for the tc1 model.

-  ``data/tc1_params_full.json``: runs the full tc1 model, including the
   default convolution layer and no feature subsampling.
-  ``data/tc1_params_small.json``: runs a faster version of the tc1
   model by ommitting the convolution layer and subsampling the
   features.

The hyperparameter configuration file has a json format consisting of a
list of json dictionaries, each one of which defines a hyperparameter.
Each dictionary has the following required keys:

-  name: the name of the hyperparameter (e.g. epochs)
-  type: determines how the models are initialized from the named
   parameter - one of ``constant``, ``int``, ``float``, ``logical``, or
   ``categorical``.

   -  ``constant``: all the tc1 models are initialized with the specifed
      value
   -  ``int``: each tc1 model is initialized with an int randomly drawn
      from the range defined by ``lower`` and ``upper`` bounds
   -  ``float``: each tc1 model is initialized with a float randomly
      drawn from the range defined by ``lower`` and ``upper`` bounds
   -  ``logical``: each tc1 model is initialized with a random boolean.
   -  ``categorical``: each tc1 model is initialized with an element
      chosen at random from the list of elements in ``values``.

The following keys are required depending on value of the ``type`` key.

If the ``type`` is ``constant``: \* ``value``: the constant value

If the ``type`` is ``int``, or ``float``: \* ``lower``: the lower bound
of the range to randomly draw from \* ``upper``: the upper bound of the
range to randomly draw from

If the ``type`` is ``categorical``: \* ``values``: the list of elements
to randomly choose from \* ``element_type``: the type of the elements to
choose from. One of ``int``, ``float``, ``string``, or ``logical``

A sample hyperparameter definition file:

.. code:: javascript

   [
     {
       "name": "epochs",
       "type": "constant",
       "value": 5
     },

     {
       "name": "activation",
       "type": "categorical",
       "element_type": "string",
       "values": ["softmax", "elu", "softplus", "softsign", "relu", "tanh", "sigmoid", "hard_sigmoid", "linear"]
     },

     {
       "name": "batch_size",
       "type": "categorical",
       "element_type": "int",
       "values": [32, 64]
     },

     {
       "name": "lr",
       "type": "float",
       "lower": 0.0001,
       "upper": 0.01
     }
   ]

Note that any other keys are ignored by the workflow but can be used to
add additional information about the hyperparameter. For example, the
sample files contain a ``comment`` entry that contains additional
information about that hyperparameter.

Workflow Explained
------------------

The workflow consists of 3 parts. The DNN tc1 model in ``models/tc1``,
the PBT python code in ``python/pbt.py`` and the python code that runs
the tc1 model using PBT, ``python/tc1_pbt.py``.

tc1
~~~

The tc1 model is a lightly modified version of the CANDLE tc1 benchmark.
The code has been updated so that an external Keras callback can be
passed through ``models/tc1/tc1_runner.run()`` and attached to the
model. The PBT algorithnm is run via this callback.

``python/pbt.py``
~~~~~~~~~~~~~~~~~

``pbt.py`` provides the model-agnostic framework code for implementing a
PBT workflow. It has 4 main components.

1. A PBTMetaDataStore class. This implements an in-memory datastore for
   the model run performance and hyperparamter data. It also manages a
   locking scheme for model weight file IO in order to prevent issues
   with concurrent file access.

2. A PBTClient class. This allows an individual instance of a model to
   communicate with the PBTMetaDataStore, sending it peformance data,
   querying performance data for a better performing model, requesting
   read and write locks for reading other model weights and writing its
   own. The PBTClient and PBTMetaDataStore communicate via MPI.

3. A PBTCallback class. This is a Keras callback that given
   model-specific *ready*, *exploit*, and *explore* implementations will
   pass its current performance data to the data store and write its
   model’s weights every epoch. Then when *ready*, it will perform an an
   *evaluate* to find a better performing model. Assuming one is found,
   an *exploit* and *explore* be peformed to update its model’s weights
   and hyperparameters appropriately. A PBTCallback uses a PBTClient to
   ommunicate with a PBTMetaDataStore.

4. A PBTWorker interface. This interface defines the API for PBT’s
   *ready*, *exploit* and *explore* steps. Client code implements this
   interface, supplying implementations appropriate to that particular
   workflow.

``python/tc1_pbt.py``
~~~~~~~~~~~~~~~~~~~~~

``tc1_pbt.py`` implements PBT for the tc1 model using the classes and
functions in ``pbt.py``. In ``tc1_pbt.py``, rank 0 first generates and
distribute the hyperparameters to the models running on the other ranks.
The ga_utils package is used to read the hyperparameter definition file
(see above) and generate, a set of hyperparameters for each model. Once
the hyperparameters are distributed, a PBTMetaDataStore is started, also
on rank 0.

PBTMetaDataStore’s constructor is passed the path of the output
directory where the ``output.csv`` file will be written together with a
the path to a log file in which user customizable log messages are
written. PBTMetaDataStore also takes a reference to an *evaluate*
function that is used to evaluate a model’s current performance and
select a better performing model. That function must have the following
arguments: a list of dictionaries that contains the metadata for all the
models, and a *score* against which model performance is determined.
Exactly what the score represents (e.g. the validation loss) is domain
specific and is provided in the ``PBTWorker.pack_data`` method described
below.

In ``tc1_pbt.py``, ``truncation_select`` implements this *evaluate*
function and is passed to the PBTMetaDataStore. In
``truncation_select``, if the specified score is in the top 80% of
scores, then an empty dictionary is returned. This empty dictionary
indicates that a better performing model was not found and thus
*exploit* and *explore* should not occur. If the specified score is in
the bottom 20% then the data for a model in the top 20% is random
selected and returned in a python dictionary. The data in this
dictionary, the rank of the better performing model and its relevant
hyperparameters can then be used in *exploit* and *explore*.

With the PBTMetaDataStore initialized on rank 0, all the remaining
processes run the tc1 model. A PBTCallback is added to each one of these
models. The PBTCallback constructor requires a instance of a class that
implements the PBTWorker interface. A PBTCallback calls the 3 methods of
a PBTWorkder to:

1. Retrieve a model’s metadata and hyperparameters in order put them in
   the PBTMetaDataStore (``PBTWorker.pack_data``),
2. Specifies which performance metric to use as the ‘score’ for model
   performance (also in ``PBTWorker.pack_data``) in an *evaluate*.
3. Determine when a model is *ready* for a potential exploit and explore
   (``PBTWorker.ready``),
4. Perform the *exploit* and *explore* update (``PBTWorker.update``).

In the tc1 PBT workflow, ``tc1_pbt.TC1PBTWorker`` implements the
``PBTWorker`` interface. ``TC1PBTWorker.pack_data`` retrieves a model’s
current learning rate, and specifies the validation loss as the
performance score. ``TC1PBTWorker.ready`` specifies that the model is
*ready* every 5 epochs. (5 is too soon to begin sharing weights, but it
serves as an example and does exercise the workflow code within a
reasonable amount of time.) ``TC1PBTWorker.update`` updates the model
with a better performing learning rate after having perturbed it. Note
that ``update`` does not need to load the better performing model’s
weights. That is done automatically in PBTCallback.

In sum then, in a PBTCallback at the end of every epoch:

1. ``pack_data`` is called to put every model’s performance data and
   selected hyperparameters into the PBTMetaDataStore.
2. ``ready`` is called to determine if a model is ready for an exploit /
   explore update.
3. If ``ready`` returns true, then the PBTCallback queries the
   PBTMetaDataStore for a better performing model using the supplied
   evaluate function (e.g. ``truncation_select``).
4. If the selection function returns data from a better performing
   model, then ``update`` is called to update the under performing model
   with the better performing hyperparameters, and the PBTCallback loads
   the better performing model’s weights into the under performing
   model.

Adapting the Workflow to a Different Model
------------------------------------------

``tc1_pbt.py`` can easily be adapted to work with a different model. The
following changes will need to be made:

-  A new hyperparameter definition file. The rank 0 code that reads this
   file can be re-used.

-  A new *evaluate* function. This can be passed to the PBTMetaDataStore
   constructor in place of ``truncation_select``

-  A new PBTWorker implementation, implementing ``ready``,
   ``pack_data``, and ``update`` as appropriate for the new model and
   workflow. This can be passed to the PBTCallback in place of
   ``TC1PBTWorker``.