Exploiting Learned Policies and Learned Heuristics in Bounded-Suboptimal Search

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Machine Learning (ML) has made significant progress to perform different tasks, such as image classification, speech recognition, and natural language processing, mainly driven by deep learning. Also, ML algorithms, through learning policies or heuristics estimates, have demonstrated potential for solving deterministic problems that would usually be solved using search techniques. Nevertheless, in solving a search problem with purely learning techniques, it is not possible to deliver guarantees regarding the quality of the solution. This research explores how a learned policy or heuristic can be integrated with a bounded-suboptimal search algorithm using Focal Search, sorting the FOCAL list using the concept of discrepancies to speed up the search. On the experimental side, we train a simple neural network as a learned policy and the DeepCubeA as a learned heuristic for the 15-puzzle domain. The results show that a learned policy or heuristic can reduce, at least, one order of magnitude, the expansions than WA* with the same bound and deliver better solution quality.

Original languageEnglish
Title of host publication14th International Symposium on Combinatorial Search, SoCS 2021
EditorsHang Ma, Ivan Serina
PublisherAssociation for the Advancement of Artificial Intelligence
Pages219-221
Number of pages3
ISBN (Electronic)9781713834557
DOIs
StatePublished - 2021
Externally publishedYes
Event14th International Symposium on Combinatorial Search, SoCS 2021 - Guangzhou, Virtual, China
Duration: 26 Jul 202130 Jul 2021

Publication series

Name14th International Symposium on Combinatorial Search, SoCS 2021

Conference

Conference14th International Symposium on Combinatorial Search, SoCS 2021
Country/TerritoryChina
CityGuangzhou, Virtual
Period26/07/2130/07/21

Fingerprint

Dive into the research topics of 'Exploiting Learned Policies and Learned Heuristics in Bounded-Suboptimal Search'. Together they form a unique fingerprint.

Cite this