Run mlrMBO based hyperparameter optimization on CANDLE Benchmarks ================================================================= mlrMBO is an iterative optimizer written in R. It evaluates the best values of hyperparameters for CANDLE “Benchmarks” available here: ``git@github.com:ECP-CANDLE/Benchmarks.git`` - given set of parameters. Running ------- 1. cd into the ``~/Supervisor/workflows/mlrMBO/test`` directory 2. Specify the ``MODEL_NAME`` in the ``test-.sh`` file, hyperparameters in ``cfg-prm-1.txt`` 3. Specify the number or processes, queue etc., in ``cfg-sys-1.sh`` file 4. Launch the test by invoking ``./test-1.sh benchmark machine`` where the machine can be ``cori``, ``theta``, ``titan`` etc. 5. The benchmark will be run for the number of processors specified 6. Final objective function value will be available in the experiments directory and also printed User requirements ----------------- What you need to install to run the workflow: - This workflow - ``git@github.com:ECP-CANDLE/Supervisor.git`` . Clone and ``cd`` to ``workflows/nt3_mlrMBO`` (the directory containing this README). - NT3 benchmark - ``git@github.com:ECP-CANDLE/Benchmarks.git`` . Clone and switch to the ``frameworks`` branch. - benchmark data - See the individual benchmarks README for obtaining the initial data Calling sequence ---------------- Script call stack - user shell -> - test-1.sh -> - swift/workflow.sh -> (submits to compute nodes) - swift/workflow.swift -> - common/swift/obj_app.swift -> - common/sh/model.sh -> - common/python/model_runner.py -> - the benchmark/model Environment settings - upf-1.sh -> - cfg-sys-1.sh -> - common/sh/ - env, langs .sh files Making Changes -------------- Structure ~~~~~~~~~ The point of the script structure is that it is easy to make copy and modify the ``test-*.sh`` script, and the ``cfg-*.sh`` scripts. These can be checked back into the repo for use by others. The ``test-*.sh`` script and the ``cfg-*.sh`` scripts should simply contain environment variables that control how ``workflow.sh`` and ``workflow.swift`` operate. ``test-1`` and ``cfg-{sys,prm}-1`` should be unmodified for simple testing. Calling a different objective function ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To call a different objective function: 1. Copy ``common/swift/obj_app.swift`` to a new directory and/or file name. 2. Edit the ``app`` function body to run your code and return the result. 3. Edit a ``test-*.sh`` script to set environment variables: - ``OBJ_DIR``: Set this to the new directory (If changed. Otherwise, ``OBJ_DIR`` defaults to the absolute path to common/swift .) - ``OBJ_MODULE``: Set this to the Swift file name without suffix (If changed. Otherwise, ``OBJ_MODULE`` defaults to ``obj_app`` .) 4. Run it! Simple test for changing objective function: :: $ cd mlrMBO/ # This directory $ export OBJ_DIR=$PWD/test $ export OBJ_MODULE=test_obj_fail # Cf. test/test_obj_fail.swift $ test/test-1.sh ___ dunedin # Dummy argument for MODEL_NAME (unused) ... Swift: Assertion failed!: test-obj-fail.swift was successfully invoked! ... This indicates that the code in ``test_obj_fail.swift`` was executed instead of ``obj_app.swift`` . Where to check for output ~~~~~~~~~~~~~~~~~~~~~~~~~ This includes error output. When you run the test script, you will get a message about ``TURBINE_OUTPUT`` . This will be the main output directory for your run. - On a local system, stdout/stderr for the workflow will go to your terminal. - On a scheduled system, stdout/stderr for the workflow will go to ``TURBINE_OUTPUT/output.txt`` The individual objective function (model) runs stdout/stderr go into directories of the form: ``TURBINE_OUTPUT/EXPID/run/RUNID/model.log`` where ``EXPID`` is the user-provided experiment ID, and ``RUNID`` are the various model runs generated by mlrMBO, one per parameter set, of the form ``R_I_J`` where ``R`` is the restart number, ``I`` is the iteration number, and ``J`` is the sample within the iteration.