AI Safe Exploration: Reinforced learning with a blocker in unsafe environments

Koivisto, Marco; Crockett, Philip; Spångberg, Axel

Sammanfattning

Artificial intelligence can be trained with a trial and error based approach. In an environment where a catastrophe can not be accepted a human overseer can be used, but this might lower the efficiency of the learning. The study includes implementation of an artifact meant to replace the human overseer when training an AI in simulated unsafe environments. The results of testing the implemented blocker shows that it can be used for avoiding catastrophes and finding a path to reach the goal in 17 out of 18 runs. The single failed execution shows that the implemented blocker is in need of improvement in terms of data efficiency. Shaping rewards solely to reduce number of steps and catastrophes for a reinforcement learning agent has been done successfully to some degree, but further steps can be taken to lower the number of catastrophes and steps.

Examinationsnivå

Student essay

Datum

2019-11-12

Författare

Koivisto, Marco

Crockett, Philip

Spångberg, Axel

Nyckelord

Artificial Intelligence

Reinforcement learning

Safe exploration

Blocker

Machine Learning

Baby AI Game

Gym Mini Grid

Språk

eng

Metadata

Visa fullständig post