Paper: Estimating the Reliability of MDP Policies: a Confidence Interval Approach

ACL ID N07-1035
Title Estimating the Reliability of MDP Policies: a Confidence Interval Approach
Venue Human Language Technologies
Session Main Conference
Year 2007
Authors

Past approaches for using reinforcement learning to derive dialog control policies have assumed that there was enough col- lected data to derive a reliable policy. In this paper we present a methodology for numerically constructing con dence inter- vals for the expected cumulative reward for a learned policy. These intervals are used to (1) better assess the reliability of the expected cumulative reward, and (2) perform a re ned comparison between policies derived from different Markov Decision Processes (MDP) models. We applied this methodology to a prior ex- periment where the goal was to select the best features to include in the MDP state- space. Our results show that while some of the policies developed in the prior work exhibited very large con dence intervals, the policy developed f...