AI Safe Exploration: Reinforced learning with a blocker in unsafe environments
Abstract
Artificial intelligence can be trained with a trial and
error based approach. In an environment where a catastrophe
can not be accepted a human overseer can be used, but this
might lower the efficiency of the learning. The study includes
implementation of an artifact meant to replace the human
overseer when training an AI in simulated unsafe environments.
The results of testing the implemented blocker shows that it can
be used for avoiding catastrophes and finding a path to reach
the goal in 17 out of 18 runs. The single failed execution shows
that the implemented blocker is in need of improvement in terms
of data efficiency. Shaping rewards solely to reduce number of
steps and catastrophes for a reinforcement learning agent has
been done successfully to some degree, but further steps can be
taken to lower the number of catastrophes and steps.
Degree
Student essay
Collections
Date
2019-11-12Author
Koivisto, Marco
Crockett, Philip
Spångberg, Axel
Keywords
Artificial Intelligence
Reinforcement learning
Safe exploration
Blocker
Machine Learning
Baby AI Game
Gym Mini Grid
Language
eng