Multi-armed bandit-based hyper-heuristics for combinatorial optimization problems

Felipe Lagos, Jordi Pereira

Research output: Contribution to journalArticlepeer-review


There are significant research opportunities in the integration of Machine Learning (ML) methods and Combinatorial Optimization Problems (COPs). In this work, we focus on metaheuristics to solve COPs that have an important learning component. These algorithms must explore a solution space and learn from the information they obtain in order to find high-quality solutions. Among the metaheuristics, we study Hyper-Heuristics (HHs), algorithms that, given a number of low-level heuristics, iteratively select and apply heuristics to a solution. The HH we consider has a Markov model to produce sequences of low-level heuristics, which we combine with a Multi-Armed Bandit Problem (MAB)-based method to learn its parameters. This work proposes several improvements to the HH metaheuristic that yields a better learning for solving problem instances. Specifically, this is the first work in HHs to present Exponential Weights for Exploration and Exploitation (EXP3) as a learning method, an algorithm that is able to deal with adversarial settings. We also present a case study for the Vehicle Routing Problem with Time Windows (VRPTW), for which we include a list of low-level heuristics that have been proposed in the literature. We show that our algorithms can handle a large and diverse list of heuristics, illustrating that they can be easily configured to solve COPs of different nature. The computational results indicate that our algorithms are competitive methods for the VRPTW (2.16% gap on average with respect to the best known solutions), demonstrating the potential of these algorithms to solve COPs. Finally, we show how algorithms can even detect low-level heuristics that do not contribute to finding better solutions to the problem.

Original languageEnglish
Pages (from-to)70-91
Number of pages22
JournalEuropean Journal of Operational Research
Issue number1
StatePublished - 1 Jan 2024
Externally publishedYes


  • Combinatorial optimization
  • Hyper-heuristics
  • Machine learning
  • Metaheuristics


Dive into the research topics of 'Multi-armed bandit-based hyper-heuristics for combinatorial optimization problems'. Together they form a unique fingerprint.

Cite this