Balancing Safety and Exploitability in Opponent Modeling

Institute Homepage

Institute Homepage DE Sign In

Back

Empirical Inference Conference Paper 2011

Research Group Leader

Opponent modeling is a critical mechanism in repeated games. It allows a player to adapt its strategy in order to better respond to the presumed preferences of his opponents. We introduce a new modeling technique that adaptively balances exploitability and risk reduction. An opponent’s strategy is modeled with a set of possible strategies that contain the actual strategy with a high probability. The algorithm is safe as the expected payoff is above the minimax payoff with a high probability, and can exploit the opponents’ preferences when sufficient observations have been obtained. We apply them to normal-form games and stochastic games with a finite number of stages. The performance of the proposed approach is first demonstrated on repeated rock-paper-scissors games. Subsequently, the approach is evaluated in a human-robot table-tennis setting where the robot player learns to prepare to return a served ball. By modeling the human players, the robot chooses a forehand, backhand or middle preparation pose before they serve. The learned strategies can exploit the opponent’s preferences, leading to a higher rate of successful returns.

Author(s):	Wang, Z. and Boularias, A. and Mülling, K. and Peters, J.
Book Title:	Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence (AAAI 2011)
Pages:	1515-1520
Year:	2011
Month:	August
Day:	0
Editors:	Burgard, W. and Roth, D.
Publisher:	AAAI Press

Bibtex Type:	Conference Paper (inproceedings)

Address:	Menlo Park, CA, USA
Event Place:	San Francisco, CA, USA

Electronic Archiving:	grant_archive
ISBN:	978-1-577-35507-6

Links:	PDF Web

BibTex

@inproceedings{WangBMP2011,
  title = {Balancing Safety and Exploitability in Opponent Modeling},
  booktitle = {Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence (AAAI 2011)},
  abstract = {Opponent modeling is a critical mechanism in repeated games. It allows a player to adapt its strategy in order to better respond to the presumed preferences of his opponents. We introduce a new modeling technique that adaptively balances exploitability and risk reduction. An opponent’s strategy is modeled with a set of possible strategies that contain the actual strategy with a high probability. The algorithm is safe as the expected payoff is above the minimax payoff with a high probability, and can exploit the opponents’ preferences when sufficient observations have been obtained. We apply them to normal-form games and stochastic games with a finite number of stages. The performance of the proposed approach is first demonstrated on repeated rock-paper-scissors games. Subsequently, the approach is evaluated in a human-robot table-tennis setting where the robot player learns to prepare to return a served ball. By modeling the human players, the robot chooses a forehand, backhand or middle preparation pose before they serve. The learned strategies can exploit the opponent’s preferences, leading to a higher rate of successful returns.},
  pages = {1515-1520},
  editors = {Burgard, W. and Roth, D.},
  publisher = {AAAI Press},
  address = {Menlo Park, CA, USA},
  month = aug,
  year = {2011},
  slug = {wangbmp2011},
  author = {Wang, Z. and Boularias, A. and M{\"u}lling, K. and Peters, J.},
  month_numeric = {8}
}

Research

Departments

Research Groups

People

Contact

Our Institute

Our History

Career

Doctoral Programs

Training

Service Units

Central Scientific Facilities

Workshops

Campus Services

Impact

Cooperation

Partners and Initiatives

Research

Departments

Research Groups

People

Contact

Our Institute

Our History

Career

Doctoral Programs

Training

Service Units

Central Scientific Facilities

Workshops

Campus Services

Impact

Cooperation

Partners and Initiatives

BibTex