Run mlrMBO based hyperparameter optimization on CANDLE Benchmarks
=================================================================

mlrMBO is an iterative optimizer written in R. It evaluates the best
values of hyperparameters for CANDLE “Benchmarks” available here:
``git@github.com:ECP-CANDLE/Benchmarks.git`` - given set of parameters.

Running
-------

1. cd into the ``~/Supervisor/workflows/mlrMBO/test`` directory
2. Specify the ``MODEL_NAME`` in the ``test-<model>.sh`` file, hyperparameters in
   ``cfg-prm-1.txt``
3. Specify the number or processes, queue etc., in ``cfg-sys-1.sh`` file
4. Launch the test by invoking ``./test-1.sh benchmark machine`` where the machine can be ``cori``, ``theta``, ``titan`` etc.
5. The benchmark will be run for the number of processors specified
6. Final objective function value will be available in the experiments
   directory and also printed

User requirements
-----------------

What you need to install to run the workflow:

-  This workflow - ``git@github.com:ECP-CANDLE/Supervisor.git`` . Clone
   and ``cd`` to ``workflows/nt3_mlrMBO`` (the directory containing this
   README).
-  NT3 benchmark - ``git@github.com:ECP-CANDLE/Benchmarks.git`` . Clone
   and switch to the ``frameworks`` branch.
-  benchmark data - See the individual benchmarks README for obtaining
   the initial data

Calling sequence
----------------

Script call stack
    - user shell ->
    - test-1.sh ->
    - swift/workflow.sh -> (submits to compute nodes)
    - swift/workflow.swift ->
    - common/swift/obj_app.swift ->
    - common/sh/model.sh ->
    - common/python/model_runner.py ->
    - the benchmark/model

Environment settings
    - upf-1.sh ->
    - cfg-sys-1.sh ->
    - common/sh/
    - env, langs .sh files

Making Changes
--------------

Structure
~~~~~~~~~

The point of the script structure is that it is easy to make copy and
modify the ``test-*.sh`` script, and the ``cfg-*.sh`` scripts. These
can be checked back into the repo for use by others. The ``test-*.sh``
script and the ``cfg-*.sh`` scripts should simply contain environment
variables that control how ``workflow.sh`` and ``workflow.swift``
operate.

``test-1`` and ``cfg-{sys,prm}-1`` should be unmodified for simple
testing.

Calling a different objective function
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To call a different objective function:

1. Copy ``common/swift/obj_app.swift`` to a new directory and/or file
   name.
2. Edit the ``app`` function body to run your code and return the
   result.
3. Edit a ``test-*.sh`` script to set environment variables:

   -  ``OBJ_DIR``: Set this to the new directory (If changed. Otherwise,
      ``OBJ_DIR`` defaults to the absolute path to common/swift .)
   -  ``OBJ_MODULE``: Set this to the Swift file name without suffix (If
      changed. Otherwise, ``OBJ_MODULE`` defaults to ``obj_app`` .)

4. Run it!

Simple test for changing objective function:

::

   $ cd mlrMBO/                        # This directory
   $ export OBJ_DIR=$PWD/test
   $ export OBJ_MODULE=test_obj_fail   # Cf. test/test_obj_fail.swift
   $ test/test-1.sh ___ dunedin        # Dummy argument for MODEL_NAME (unused)
   ...
   Swift: Assertion failed!: test-obj-fail.swift was successfully invoked!
   ...

This indicates that the code in ``test_obj_fail.swift`` was executed
instead of ``obj_app.swift`` .

Where to check for output
~~~~~~~~~~~~~~~~~~~~~~~~~

This includes error output.

When you run the test script, you will get a message about
``TURBINE_OUTPUT`` . This will be the main output directory for your
run.

-  On a local system, stdout/stderr for the workflow will go to your
   terminal.
-  On a scheduled system, stdout/stderr for the workflow will go to
   ``TURBINE_OUTPUT/output.txt``

The individual objective function (model) runs stdout/stderr go into
directories of the form:

``TURBINE_OUTPUT/EXPID/run/RUNID/model.log``

where ``EXPID`` is the user-provided experiment ID, and ``RUNID`` are
the various model runs generated by mlrMBO, one per parameter set, of
the form ``R_I_J`` where ``R`` is the restart number, ``I`` is the
iteration number, and ``J`` is the sample within the iteration.