TY - GEN
T1 - Exploiting Learned Policies and Learned Heuristics in Bounded-Suboptimal Search
AU - Greco, Matias
N1 - Publisher Copyright:
Copyright © 2021, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2021
Y1 - 2021
N2 - Machine Learning (ML) has made significant progress to perform different tasks, such as image classification, speech recognition, and natural language processing, mainly driven by deep learning. Also, ML algorithms, through learning policies or heuristics estimates, have demonstrated potential for solving deterministic problems that would usually be solved using search techniques. Nevertheless, in solving a search problem with purely learning techniques, it is not possible to deliver guarantees regarding the quality of the solution. This research explores how a learned policy or heuristic can be integrated with a bounded-suboptimal search algorithm using Focal Search, sorting the FOCAL list using the concept of discrepancies to speed up the search. On the experimental side, we train a simple neural network as a learned policy and the DeepCubeA as a learned heuristic for the 15-puzzle domain. The results show that a learned policy or heuristic can reduce, at least, one order of magnitude, the expansions than WA* with the same bound and deliver better solution quality.
AB - Machine Learning (ML) has made significant progress to perform different tasks, such as image classification, speech recognition, and natural language processing, mainly driven by deep learning. Also, ML algorithms, through learning policies or heuristics estimates, have demonstrated potential for solving deterministic problems that would usually be solved using search techniques. Nevertheless, in solving a search problem with purely learning techniques, it is not possible to deliver guarantees regarding the quality of the solution. This research explores how a learned policy or heuristic can be integrated with a bounded-suboptimal search algorithm using Focal Search, sorting the FOCAL list using the concept of discrepancies to speed up the search. On the experimental side, we train a simple neural network as a learned policy and the DeepCubeA as a learned heuristic for the 15-puzzle domain. The results show that a learned policy or heuristic can reduce, at least, one order of magnitude, the expansions than WA* with the same bound and deliver better solution quality.
UR - https://www.scopus.com/pages/publications/85124604788
U2 - 10.1609/socs.v12i1.18589
DO - 10.1609/socs.v12i1.18589
M3 - Conference contribution
AN - SCOPUS:85124604788
T3 - 14th International Symposium on Combinatorial Search, SoCS 2021
SP - 219
EP - 221
BT - 14th International Symposium on Combinatorial Search, SoCS 2021
A2 - Ma, Hang
A2 - Serina, Ivan
PB - Association for the Advancement of Artificial Intelligence
T2 - 14th International Symposium on Combinatorial Search, SoCS 2021
Y2 - 26 July 2021 through 30 July 2021
ER -