Paper: Single-Agent vs. Multi-Agent Techniques for Concurrent Reinforcement Learning of Negotiation Dialogue Policies

ACL ID P14-1047
Title Single-Agent vs. Multi-Agent Techniques for Concurrent Reinforcement Learning of Negotiation Dialogue Policies
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2014
Authors

We use single-agent and multi-agent Rein- forcement Learning (RL) for learning dia- logue policies in a resource allocation ne- gotiation scenario. Two agents learn con- currently by interacting with each other without any need for simulated users (SUs) to train against or corpora to learn from. In particular, we compare the Q- learning, Policy Hill-Climbing (PHC) and Win or Learn Fast Policy Hill-Climbing (PHC-WoLF) algorithms, varying the sce- nario complexity (state space size), the number of training episodes, the learning rate, and the exploration rate. Our re- sults show that generally Q-learning fails to converge whereas PHC and PHC-WoLF always converge and perform similarly. We also show that very high gradually decreasing exploration rates are required for convergence. We conclude...