• English
    • svenska
  • svenska 
    • English
    • svenska
  • Logga in
Redigera dokument 
  •   Startsida
  • Student essays / Studentuppsatser
  • Department of Computer Science and Engineering / Institutionen för data- och informationsteknik
  • Masteruppsatser
  • Redigera dokument
  •   Startsida
  • Student essays / Studentuppsatser
  • Department of Computer Science and Engineering / Institutionen för data- och informationsteknik
  • Masteruppsatser
  • Redigera dokument
JavaScript is disabled for your browser. Some features of this site may not work without it.

Interactionwise Semantic Awareness in Visual Relationship Detection

Sammanfattning
Visual Relationship Detection (VRD) is a relatively young research area, where the goal is to develop prediction models for detecting the relationships between objects depicted in an image. A relationship is modeled as a subject-predicate-object triplet, where the predicate (e.g an action, a spatial relation, etc. such as “eat”, “chase” or “next to”) describes how the subject and the object are interacting in the given image. VRD can be formulated as a classification problem, but suffers from the effects of having a combinatorial output space; some of the major issues to overcome are long-tail class distribution, class overlapping and intra-class variance. Machine learning models have been found effective for the task and, more specifically, many works proved that combining visual, spatial and semantic features from the detected objects is key to achieving good predictions. This work investigates on the use of distributional embeddings, often used to discover/encode semantic information, in order to improve the results of an existing neural network-based architecture for VRD. Some experiments are performed in order to make the model semantic-aware of the classification output domain, namely, predicate classes. Additionally, different word embedding models are trained from scratch to better account for multi-word objects and predicates, and are then fine-tuned on VRD-related text corpora. We evaluate our methods on two datasets. Ultimately, we show that, for some set of predicate classes, semantic knowledge of the predicates exported from trained-fromscratch distributional embeddings can be leveraged to greatly improve prediction, and it’s especially effective for zero-shot learning.
Examinationsnivå
Student essay
URL:
http://hdl.handle.net/2077/66921
Samlingar
  • Masteruppsatser
Fil(er)
gupea_2077_66921_1.pdf (2.781Mb)
Datum
2020-11-06
Författare
Giovanni, Pagliarini
Azfar, Imtiaz
Nyckelord
Deep Learning
Natural Language Processing
Computer Vision
Visual Relationship Detection
Object Detection
Språk
eng
Metadata
Visa fullständig post

DSpace software copyright © 2002-2016  DuraSpace
gup@ub.gu.se | Teknisk hjälp
Theme by 
Atmire NV
 

 

Visa

VisaSamlingarI datumordningFörfattareTitlarNyckelordDenna samlingI datumordningFörfattareTitlarNyckelord

Mitt konto

Logga inRegistrera dig

DSpace software copyright © 2002-2016  DuraSpace
gup@ub.gu.se | Teknisk hjälp
Theme by 
Atmire NV