workshop

Offline Policy Comparison with Confidence: Benchmarks and Baselines

Decision makers often wish to use offline historical data to compare sequential-action policies at various world states. Importantly, …

Anurag Koul, Mariano Phielipp, Alan Fern

Dream and search to control: Latent space planning for continuous control

Learning and planning with latent space dynamics has been shown to be useful for sample efficiency in model-based reinforcement …

Anurag Koul, Varun V Kumar, Alan Fern, Somdeb Majumdar

Explaining deep adaptive programs via reward decomposition

Adaptation Based Programming (ABP) allows programmers to employ “choice points” at program locations where they are …

Martin Erwig, Alan Fern, Magesh Murali, Anurag Koul