The original publication is available at
We present a data-driven approach for producing policies that
are provably robust across unknown stochastic environments. Existing
approaches can learn models of a single environment as an interval
Markov decision processes (IMDP) and produce a robust policy with
a probably approximately correct (PAC) guarantee on its performance.
However these are unable to reason about the impact of environmental
parameters underlying the uncertainty. We propose a framework based
on parametric Markov decision processes with unknown distributions
over parameters. We learn and analyse IMDPs for a set of unknown
sample environments induced by parameters. The key challenge is then
to produce meaningful performance guarantees that combine the two
layers of uncertainty: (1) multiple environments induced by parameters
with an unknown distribution; (2) unknown induced environments which
are approximated by IMDPs. We present a novel approach based on
scenario optimisation that yields a single PAC guarantee quantifying
the risk level for which a specified performance level can be assured in
unseen environments, plus a means to trade-off risk and performance.
We implement and evaluate our framework using multiple robust policy
generation methods on a range of benchmarks. We show that our approach
produces tight bounds on a policy’s performance with high confidence.