{Static and Dynamic Values of Computation in MCTS}

Institute Homepage

Institute Homepage DE Sign In

Conference Paper 2020

Static and Dynamic Values of Computation in MCTS

{Monte-Carlo Tree Search (MCTS) is one of the most-widely used methodsfor planning, and has powered many recent advances in artificialintelligence. In MCTS, one typically performs computations(i.e., simulations) to collect statistics about the possible futureconsequences of actions, and then chooses accordingly. Manypopular MCTS methods such as UCT and its variants decide whichcomputations to perform by trading-off exploration and exploitation. Inthis work, we take a more direct approach, and explicitly quantify thevalue of a computation based on its expected impact on the quality ofthe action eventually chosen. Our approach goes beyond the \emph\textbraceleftmyopic\textbracerightlimitations of existing computation-value-based methods in two senses:(I) we are able to account for the impact of non-immediate (ie, future)computations (II) on non-immediate actions. We show that policies thatgreedily optimize computation values are optimal under certainassumptions and obtain results that are competitive with the state-of-the-art.}

Author(s):	Sezener, E and Dayan, P
Volume:	124
Pages:	31--40
Year:	2020
Series:	{Proceedings of Machine Learning Research (PMLR)}
Publisher:	Curran

Bibtex Type:	Conference Paper (inproceedings)

Electronic Archiving:	grant_archive

BibTex

@inproceedings{item_3193509,
  title = {{Static and Dynamic Values of Computation in MCTS}},
  abstract = {{Monte-Carlo Tree Search (MCTS) is one of the most-widely used methodsfor planning, and has powered many recent advances in artificialintelligence. In MCTS, one typically performs computations(i.e., simulations) to collect statistics about the possible futureconsequences of actions, and then chooses accordingly. Manypopular MCTS methods such as UCT and its variants decide whichcomputations to perform by trading-off exploration and exploitation. Inthis work, we take a more direct approach, and explicitly quantify thevalue of a computation based on its expected impact on the quality ofthe action eventually chosen. Our approach goes beyond the \emph\textbraceleftmyopic\textbracerightlimitations of existing computation-value-based methods in two senses:(I) we are able to account for the impact of non-immediate (ie, future)computations (II) on non-immediate actions. We show that policies thatgreedily optimize computation values are optimal under certainassumptions and obtain results that are competitive with the state-of-the-art.}},
  volume = {124},
  pages = {31--40},
  series = {{Proceedings of Machine Learning Research (PMLR)}},
  publisher = {Curran},
  year = {2020},
  slug = {item_3193509},
  author = {Sezener, E and Dayan, P}
}

Research

Departments

Research Groups

People

Contact

Our Institute

Our History

Career

Doctoral Programs

Training

Service Units

Central Scientific Facilities

Workshops

Campus Services

Impact

Cooperation

Partners and Initiatives