There’s a Microwave in the Hallway

dc.contributor.authorEmampoor, Yasmeen
dc.contributor.departmentGöteborgs universitet/Institutionen för data- och informationsteknikswe
dc.contributor.departmentUniversity of Gothenburg/Department of Computer Science and Engineeringeng
dc.date.accessioned2022-04-20T07:36:51Z
dc.date.available2022-04-20T07:36:51Z
dc.date.issued2022-04-20
dc.description.abstractEmbodied Question Answering (EQA) is a task in which an agent situated in virtual environment navigates from its current position to an object (Navigation), and then answer a question about it (Visual Question Answering, VQA), for example “What color is the table in the table in the kitchen?” This project examines how an agent modelled as a deep neural network uses semantic information from its language model and visual information to answer questions in the second task. This is important since due to the regular nature of the task and the dataset it could be that the model is answering questions purely based on general semantic information from its language model (tables are frequently brown) and not relying on the visual scene, a phenomenon that is commonly known as hallucinating. This project first examines the quality of the current task dataset, EQA-MP3D, and presents a series of experiments where the visual information given to the model is manipulated or corrupted. Next, this model is extended, giving it new sources of information with an expectation that the model would use it to improve grounding of questions and answers in perception. Structured information is found to be particularly helpful, in the form of identified object regions. Additionally, we examine the impact of question types on performance. The dataset includes 3 distinct question types, color, color room, and location. The baseline performance differs across types. The performance is also impacted by changes in the input differently by question type.en
dc.identifier.urihttps://hdl.handle.net/2077/71400
dc.language.isoengen
dc.setspec.uppsokTechnology
dc.subjectembodied question answeringen
dc.subjectvisual question answeringen
dc.subjectmulti-modalityen
dc.subjectinformation fusionen
dc.titleThere’s a Microwave in the Hallwayen
dc.typetext
dc.type.degreeStudent essay
dc.type.uppsokH2

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
CSE 22-04 Emampoor.pdf
Size:
3.95 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
876 B
Format:
Item-specific license agreed upon to submission
Description:

Collections