Interactionwise Semantic Awareness in Visual Relationship Detection

Giovanni, Pagliarini; Azfar, Imtiaz

dc.contributor.author	Giovanni, Pagliarini
dc.contributor.author	Azfar, Imtiaz
dc.date.accessioned	2020-11-06T08:58:07Z
dc.date.available	2020-11-06T08:58:07Z
dc.date.issued	2020-11-06
dc.identifier.uri	http://hdl.handle.net/2077/66921
dc.description.abstract	Visual Relationship Detection (VRD) is a relatively young research area, where the goal is to develop prediction models for detecting the relationships between objects depicted in an image. A relationship is modeled as a subject-predicate-object triplet, where the predicate (e.g an action, a spatial relation, etc. such as “eat”, “chase” or “next to”) describes how the subject and the object are interacting in the given image. VRD can be formulated as a classification problem, but suffers from the effects of having a combinatorial output space; some of the major issues to overcome are long-tail class distribution, class overlapping and intra-class variance. Machine learning models have been found effective for the task and, more specifically, many works proved that combining visual, spatial and semantic features from the detected objects is key to achieving good predictions. This work investigates on the use of distributional embeddings, often used to discover/encode semantic information, in order to improve the results of an existing neural network-based architecture for VRD. Some experiments are performed in order to make the model semantic-aware of the classification output domain, namely, predicate classes. Additionally, different word embedding models are trained from scratch to better account for multi-word objects and predicates, and are then fine-tuned on VRD-related text corpora. We evaluate our methods on two datasets. Ultimately, we show that, for some set of predicate classes, semantic knowledge of the predicates exported from trained-fromscratch distributional embeddings can be leveraged to greatly improve prediction, and it’s especially effective for zero-shot learning.	sv
dc.language.iso	eng	sv
dc.subject	Deep Learning	sv
dc.subject	Natural Language Processing	sv
dc.subject	Computer Vision	sv
dc.subject	Visual Relationship Detection	sv
dc.subject	Object Detection	sv
dc.title	Interactionwise Semantic Awareness in Visual Relationship Detection	sv
dc.type	text
dc.setspec.uppsok	Technology
dc.type.uppsok	H2
dc.contributor.department	Göteborgs universitet/Institutionen för data- och informationsteknik	swe
dc.contributor.department	University of Gothenburg/Department of Computer Science and Engineering	eng
dc.type.degree	Student essay

Filer under denna titel

Namn:: gupea_2077_66921_1.pdf
Storlek:: 2.781Mb
Format:: PDF

Fil(er)

Dokumentet tillhör följande samling(ar)

Masteruppsatser

Visa enkel post