Neuro-algorithmic Policies Enable Fast Combinatorial Generalization

Institute Homepage

Institute Homepage Sign In

Back

Autonomous Learning Conference Paper 2021

Autonomous Learning

Marin Vlastelica Pogancic

Autonomous Learning

Michal Rolinek

Empirical Inference, Autonomous Learning

Georg Martius

Senior Research Scientist

Although model-based and model-free approa\-ches to learning the control of systems have achieved impressive results on standard benchmarks, generalization to task variations is still lacking. Recent results suggest that generalization for standard architectures improves only after obtaining exhaustive amounts of data. We give evidence that generalization capabilities are in many cases bottlenecked by the inability to generalize on the combinatorial aspects of the problem. We show that, for a certain subclass of the MDP framework, this can be alleviated by a neuro-algorithmic policy architecture that embeds a time-dependent shortest path solver in a deep neural network. Trained end-to-end via blackbox-differentiation, this method leads to considerable improvement in generalization capabilities in the low-data regime.

Author(s):	Marin Vlastelica and Michal Rolinek and Georg Martius
Book Title:	Proceedings of the 2021 International Conference on Machine Learning (ICML)
Year:	2021
Month:	July

Project(s):	Combinatorial Optimization as a Layer / Blackbox Differentiation
Bibtex Type:	Conference Paper (inproceedings)

Event Name:	The Thirty-eighth International Conference on Machine Learning (ICML)
Event Place:	Virtual

Electronic Archiving:	grant_archive

Links:	arXiv Spotlight PDF

BibTex

@inproceedings{VlastelicaEtal2021:NeuroAlgorithmic,
  title = {Neuro-algorithmic Policies Enable Fast Combinatorial Generalization},
  booktitle = {Proceedings of the 2021 International Conference on Machine Learning (ICML)},
  abstract = {Although model-based and model-free approa\-ches to learning the control of systems have achieved impressive results on standard benchmarks, generalization to task variations is still lacking. Recent results suggest that generalization for standard architectures improves only after obtaining exhaustive amounts of data. We give evidence that generalization capabilities are in many cases bottlenecked by the inability to generalize on the combinatorial aspects of the problem. We show that, for a certain subclass of the MDP framework, this can be alleviated by a neuro-algorithmic policy architecture that embeds a time-dependent shortest path solver in a deep neural network. Trained end-to-end via blackbox-differentiation, this method leads to considerable improvement in generalization capabilities in the low-data regime.},
  month = jul,
  year = {2021},
  slug = {vlastelicaetal2021-neuroalgorithmic},
  author = {Vlastelica, Marin and Rolinek, Michal and Martius, Georg},
  month_numeric = {7}
}