dc.contributor.author | Koivisto, Marco | |
dc.contributor.author | Crockett, Philip | |
dc.contributor.author | Spångberg, Axel | |
dc.date.accessioned | 2019-11-12T10:40:01Z | |
dc.date.available | 2019-11-12T10:40:01Z | |
dc.date.issued | 2019-11-12 | |
dc.identifier.uri | http://hdl.handle.net/2077/62440 | |
dc.description.abstract | Artificial intelligence can be trained with a trial and
error based approach. In an environment where a catastrophe
can not be accepted a human overseer can be used, but this
might lower the efficiency of the learning. The study includes
implementation of an artifact meant to replace the human
overseer when training an AI in simulated unsafe environments.
The results of testing the implemented blocker shows that it can
be used for avoiding catastrophes and finding a path to reach
the goal in 17 out of 18 runs. The single failed execution shows
that the implemented blocker is in need of improvement in terms
of data efficiency. Shaping rewards solely to reduce number of
steps and catastrophes for a reinforcement learning agent has
been done successfully to some degree, but further steps can be
taken to lower the number of catastrophes and steps. | sv |
dc.language.iso | eng | sv |
dc.subject | Artificial Intelligence | sv |
dc.subject | Reinforcement learning | sv |
dc.subject | Safe exploration | sv |
dc.subject | Blocker | sv |
dc.subject | Machine Learning | sv |
dc.subject | Baby AI Game | sv |
dc.subject | Gym Mini Grid | sv |
dc.title | AI Safe Exploration: Reinforced learning with a blocker in unsafe environments | sv |
dc.type | text | |
dc.setspec.uppsok | Technology | |
dc.type.uppsok | M2 | |
dc.contributor.department | Göteborgs universitet/Institutionen för data- och informationsteknik | swe |
dc.contributor.department | University of Gothenburg/Department of Computer Science and Engineering | eng |
dc.type.degree | Student essay | |