Reinforcement learning for search tree size minimization in Constraint Programming: New results on scheduling benchmarks

Heinz V.; Vilím P.; Hanzálek Z.

dc.contributor.author	Heinz V.
dc.contributor.author	Vilím P.
dc.contributor.author	Hanzálek Z.
dc.date.accessioned	2025-10-14T08:54:54Z
dc.date.available	2025-10-14T08:54:54Z
dc.date.issued	2025
dc.identifier	V3S-384962
dc.identifier.citation	HEINZ, V., P. VILÍM, and Z. HANZÁLEK. Reinforcement learning for search tree size minimization in Constraint Programming: New results on scheduling benchmarks. Computers & Industrial Engineering. 2025, 209 ISSN 0360-8352. DOI 10.1016/j.cie.2025.111413.
dc.identifier.issn	0360-8352 (print)
dc.identifier.issn	1879-0550 (online)
dc.identifier.uri	http://hdl.handle.net/10467/127231
dc.description.abstract	Failure-Directed Search (FDS) is a significant complete generic search algorithm used in Constraint Programming (CP) to efficiently explore the search space, proven particularly effective on scheduling problems. This paper analyzes FDS’s properties, showing that minimizing the size of its search tree guided by ranked branching decisions is closely related to the Multi-armed bandit (MAB) problem. Building on this insight, MAB reinforcement learning algorithms are applied to FDS, extended with problem-specific refinements and parameter tuning, and evaluated on the two most fundamental scheduling problems, the Job Shop Scheduling Problem (JSSP) and Resource-Constrained Project Scheduling Problem (RCPSP). The resulting enhanced FDS, using the best extended MAB algorithm and configuration, performs 1.7 times faster on the JSSP and 2.5 times faster on the RCPSP benchmarks compared to the original implementation in a new solver called OptalCP, while also being 3.5 times faster on the JSSP and 2.1 times faster on the RCPSP benchmarks than the current state-of-the-art FDS algorithm in IBM CP Optimizer 22.1. Furthermore, using only a 900 s time limit per instance, the enhanced FDS improved the existing state-of-the-art lower bounds of 78 of 84 JSSP and 226 of 393 RCPSP standard open benchmark instances while also completely closing a few of them.	eng
dc.format.mimetype	application/pdf
dc.language.iso	eng
dc.publisher	Elsevier
dc.relation.ispartof	Computers & Industrial Engineering
dc.rights	Creative Commons Attribution (CC BY) 4.0
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/
dc.subject	Constraint ProgrammingReinforcement learningDiscrete optimizationSchedulingTree searchHeuristics	eng
dc.title	Reinforcement learning for search tree size minimization in Constraint Programming: New results on scheduling benchmarks	eng
dc.type	článek v časopise	cze
dc.type	journal article	eng
dc.identifier.doi	10.1016/j.cie.2025.111413
dc.relation.projectid	info:eu-repo/grantAgreement/Czech Science Foundation/GA/GA22-31670S/CZ/Scheduling Tests in Medical Laboratories: Reduction of Turn-Around Time/
dc.relation.projectid	info:eu-repo/grantAgreement/EC/OPJAK/CZ.02.01.01%2F00%2F22_008%2F0004590/CZ/Robotics and advanced industrial production/ROBOPROX
dc.rights.access	openAccess
dc.type.status	Peer-reviewed
dc.type.version	acceptedVersion
dc.identifier.scopus	2-s2.0-105014103087

Soubory tohoto záznamu

Název:: Heinz_Vilim_Hanzalek__Reinforc ...
Velikost:: 765.7Kb
Formát:: PDF
Popis:: ACCEPTED ## OPEN ## Creative ...
: Zobrazit/otevřít

Tento záznam se objevuje v následujících kolekcích

Publikační činnost ČVUT [1547]

Zobrazit minimální záznam

Kromě případů, kde je uvedeno jinak, licence tohoto záznamu je Creative Commons Attribution (CC BY) 4.0