This project is no longer maintained. However, please feel free to contact me if you have questions about the associated paper or believe parts of the source code may be of use to you.
This project incorporates the auto-sklearn toolkit (asl) into a solver runtime
prediction framework. The predictions are then passed to a second classification
model which yields a solution to the algorithm selection problem (as).
This project is written in python3 and can be installed with pip.
The project indirectly depends on the pyrfr
and, thus, also requires SWIG-3.
pip3 install -r requirements.txt
pip3 install -r requirements.2.txt
N.B. If installing under anaconda, use pip rather than pip3
Example usage for OASC
All usage requires a yaml configuration file. This needs to include a
base_path key, and all results will be to that location.
An example (complete) config file:
base_path: /prj/oasc2017
Short test run
process-oasc-scenario oasc.yaml /mldb/oasc_scenarios/train/Bado/ /mldb/oasc_scenarios/test/Bado/ --total-training-time 30 --num-cpus 3 --logging-level INFO --max-feature-steps 1
Command for OASC-like run
process-oasc-scenario oasc.yaml /path/to/oasc_scenarios/train/Bado/ /path/to/oasc_scenarios/test/Bado/ --total-training-time 600 --num-cpus 8 --logging-level INFO
This command uses the training set to learn an algorithm selection scheduler. It
then uses that scheduler to create solving schedules for the test set. The
--total-training-time parameter gives the approximate amount of time (in
seconds) allocated to auto-sklearn for each internal model.
This script also performs feature set selection and determines a presolving
schedule. The --max-feature-steps can be given to limit the number of feature
steps considered in the search.
Typical problem sets for OASC take about an hour or two with 8 CPUs when 10 minutes are used for each model.
This script also accepts other optional parameters controlling the behavior of
auto-sklearn, BLAS, logging, etc. These were all kept at default values for
submissions.
The --help flag can be given to see all options and their default values.
The schedule for the test instances is written to: <base_path>/schedule.asl.<scenario_name>.json.
The learned scheduler is written to: <base_path>/model.asl.scheduler.<scenario_name>.pkl.gz
process-oasc-scenario oasc.yaml /path/to/oasc_scenarios/train/Bado/ /path/to/oasc_scenarios/test/Bado/ --use-random-forests --num-cpus 8 --logging-level INFO
This command differs from the command above because it includes the
--use-random-forests flag. Rather than learn the internal models for the
scheduler using auto-sklearn, it instead uses standard sklearn random
forests (with 100 trees).
It ignores the auto-sklearn parameters (such as --total-training-time). It
tends to be somewhat faster since it avoids the Bayesian optimization. Still,
many of the steps (such as feature step selection) are the same for both, so the
typical time on OASC is similar to that mentioned above.
The schedule for the test instances is written to: <base_path>/schedule.rf.<scenario_name>.json.
The learned scheduler is written to: <base_path>/model.rf.scheduler.<scenario_name>.pkl.gz
This project relies heavily on the automl_utils
module. Indeed, almost all of the logic for interacting with auto-sklearn is
wrapped in that module.