Coherent Inference on Optimal Play in Game Trees

Institute Homepage

Institute Homepage DE Sign In

Back

Empirical Inference Probabilistic Numerics Conference Paper 2010

Probabilistic Numerics, Empirical Inference

Philipp Hennig

Affiliated Researcher

Round-based games are an instance of discrete planning problems. Some of the best contemporary game tree search algorithms use random roll-outs as data. Relying on a good policy, they learn on-policy values by propagating information upwards in the tree, but not between sibling nodes. Here, we present a generative model and a corresponding approximate message passing scheme for inference on the optimal, off-policy value of nodes in smooth AND/OR trees, given random roll-outs. The crucial insight is that the distribution of values in game trees is not completely arbitrary. We define a generative model of the on-policy values using a latent score for each state, representing the value under the random roll-out policy. Inference on the values under the optimal policy separates into an inductive, pre-data step and a deductive, post-data part. Both can be solved approximately with Expectation Propagation, allowing off-policy value inference for any node in the (exponentially big) tree in linear time.

Author(s):	Hennig, P. and Stern, D. and Graepel, T.
Book Title:	JMLR Workshop and Conference Proceedings Volume 9: AISTATS 2010
Pages:	326-333
Year:	2010
Month:	May
Day:	0
Editors:	Teh, Y.W. , M. Titterington
Publisher:	JMLR

Bibtex Type:	Conference Paper (inproceedings)

Address:	Cambridge, MA, USA
Event Name:	Thirteenth International Conference on Artificial Intelligence and Statistics
Event Place:	Chia Laguna Resort, Italy

Digital:	0
Electronic Archiving:	grant_archive

Links:	PDF Web

BibTex

@inproceedings{HennigSG2010,
  title = {Coherent Inference on Optimal Play in Game Trees},
  booktitle = {JMLR Workshop and Conference Proceedings Volume 9: AISTATS 2010},
  abstract = {Round-based games are an instance of discrete planning problems. Some of the best contemporary game tree search algorithms use random roll-outs as data. Relying on a good policy, they learn on-policy values by propagating information upwards in the tree, but not between sibling nodes. Here, we present a generative model and a corresponding approximate message passing scheme for inference on the optimal, off-policy value of nodes in smooth AND/OR trees, given random roll-outs. The crucial insight is that the distribution of values in game trees is not completely arbitrary. We define a generative model of the on-policy values using a latent score for each state, representing the value under the random roll-out policy. Inference on the values under the optimal policy separates into an inductive, pre-data step and a deductive, post-data part. Both can be solved approximately with Expectation Propagation, allowing off-policy value inference for any node in the (exponentially big) tree in linear time. },
  pages = {326-333},
  editors = {Teh, Y.W. , M. Titterington },
  publisher = {JMLR},
  address = {Cambridge, MA, USA},
  month = may,
  year = {2010},
  slug = {hennigsg2010},
  author = {Hennig, P. and Stern, D. and Graepel, T.},
  month_numeric = {5}
}

Research

Departments

Research Groups

People

Contact

Our Institute

Our History

Career

Doctoral Programs

Training

Service Units

Central Scientific Facilities

Workshops

Campus Services

Impact

Cooperation

Partners and Initiatives