Visa enkel post

dc.contributor.authorEmampoor, Yasmeen
dc.date.accessioned2022-04-20T07:36:51Z
dc.date.available2022-04-20T07:36:51Z
dc.date.issued2022-04-20
dc.identifier.urihttps://hdl.handle.net/2077/71400
dc.description.abstractEmbodied Question Answering (EQA) is a task in which an agent situated in virtual environment navigates from its current position to an object (Navigation), and then answer a question about it (Visual Question Answering, VQA), for example “What color is the table in the table in the kitchen?” This project examines how an agent modelled as a deep neural network uses semantic information from its language model and visual information to answer questions in the second task. This is important since due to the regular nature of the task and the dataset it could be that the model is answering questions purely based on general semantic information from its language model (tables are frequently brown) and not relying on the visual scene, a phenomenon that is commonly known as hallucinating. This project first examines the quality of the current task dataset, EQA-MP3D, and presents a series of experiments where the visual information given to the model is manipulated or corrupted. Next, this model is extended, giving it new sources of information with an expectation that the model would use it to improve grounding of questions and answers in perception. Structured information is found to be particularly helpful, in the form of identified object regions. Additionally, we examine the impact of question types on performance. The dataset includes 3 distinct question types, color, color room, and location. The baseline performance differs across types. The performance is also impacted by changes in the input differently by question type.en_US
dc.language.isoengen_US
dc.subjectembodied question answeringen_US
dc.subjectvisual question answeringen_US
dc.subjectmulti-modalityen_US
dc.subjectinformation fusionen_US
dc.titleThere’s a Microwave in the Hallwayen_US
dc.typetext
dc.setspec.uppsokTechnology
dc.type.uppsokH2
dc.contributor.departmentGöteborgs universitet/Institutionen för data- och informationsteknikswe
dc.contributor.departmentUniversity of Gothenburg/Department of Computer Science and Engineeringeng
dc.type.degreeStudent essay


Filer under denna titel

Thumbnail

Dokumentet tillhör följande samling(ar)

Visa enkel post