Conference Paper 2020

Static and Dynamic Values of Computation in MCTS

{Monte-Carlo Tree Search (MCTS) is one of the most-widely used methodsfor planning, and has powered many recent advances in artificialintelligence. In MCTS, one typically performs computations(i.e., simulations) to collect statistics about the possible futureconsequences of actions, and then chooses accordingly. Manypopular MCTS methods such as UCT and its variants decide whichcomputations to perform by trading-off exploration and exploitation. Inthis work, we take a more direct approach, and explicitly quantify thevalue of a computation based on its expected impact on the quality ofthe action eventually chosen. Our approach goes beyond the \emph\textbraceleftmyopic\textbracerightlimitations of existing computation-value-based methods in two senses:(I) we are able to account for the impact of non-immediate (ie, future)computations (II) on non-immediate actions. We show that policies thatgreedily optimize computation values are optimal under certainassumptions and obtain results that are competitive with the state-of-the-art.}

Author(s): Sezener, E and Dayan, P
Volume: 124
Pages: 31--40
Year: 2020
Series: {Proceedings of Machine Learning Research (PMLR)}
Publisher: Curran
Bibtex Type: Conference Paper (inproceedings)
Electronic Archiving: grant_archive

BibTex

@inproceedings{item_3193509,
  title = {{Static and Dynamic Values of Computation in MCTS}},
  abstract = {{Monte-Carlo Tree Search (MCTS) is one of the most-widely used methodsfor planning, and has powered many recent advances in artificialintelligence. In MCTS, one typically performs computations(i.e., simulations) to collect statistics about the possible futureconsequences of actions, and then chooses accordingly. Manypopular MCTS methods such as UCT and its variants decide whichcomputations to perform by trading-off exploration and exploitation. Inthis work, we take a more direct approach, and explicitly quantify thevalue of a computation based on its expected impact on the quality ofthe action eventually chosen. Our approach goes beyond the \emph\textbraceleftmyopic\textbracerightlimitations of existing computation-value-based methods in two senses:(I) we are able to account for the impact of non-immediate (ie, future)computations (II) on non-immediate actions. We show that policies thatgreedily optimize computation values are optimal under certainassumptions and obtain results that are competitive with the state-of-the-art.}},
  volume = {124},
  pages = {31--40},
  series = {{Proceedings of Machine Learning Research (PMLR)}},
  publisher = {Curran},
  year = {2020},
  slug = {item_3193509},
  author = {Sezener, E and Dayan, P}
}