Explainable reinforcement learning via reward decomposition

Zoe Juozapaitis, Anurag Koul, Alan Fern, Martin Erwig, Finale Doshi-Velez

January, 2019

Abstract

We study reward decomposition for explaining the decisions of reinforcement learning (RL) agents. The approach decomposes rewards into sums of semantically meaningful reward types, so that actions can be compared in terms of trade-offs among the types. In particular, we introduce the concept of minimum sufficient ex- planations for compactly explaining why one action is preferred over another in terms of the types. Many prior RL algorithms for decom- posed rewards produced inconsistent decom- posed values, which can be ill-suited to expla- nation. We exploit an off-policy variant of Q- learning that provably converges to an optimal policy and the correct decomposed action val- ues. We illustrate the approach in a number of domains, showing its utility for explanations.

Type

Conference

Publication

International Joint Conference on Artificial Intelligence

Explainable reinforcement learning via reward decomposition

Abstract

Anurag Koul

Applied Scientist 2