Explainable reinforcement learning via reward decomposition

Abstract

We study reward decomposition for explaining the decisions of reinforcement learning (RL) agents. The approach decomposes rewards into sums of semantically meaningful reward types, so that actions can be compared in terms of trade-offs among the types. In particular, we introduce the concept of minimum sufficient ex- planations for compactly explaining why one action is preferred over another in terms of the types. Many prior RL algorithms for decom- posed rewards produced inconsistent decom- posed values, which can be ill-suited to expla- nation. We exploit an off-policy variant of Q- learning that provably converges to an optimal policy and the correct decomposed action val- ues. We illustrate the approach in a number of domains, showing its utility for explanations.

Publication
International Joint Conference on Artificial Intelligence
Anurag Koul
Anurag Koul
PostDoctoral Researcher