Logical properties of Natural Language Inference - Experiments with Synthetic Data to Study Consequence Relations in LSTMs

No Thumbnail Available

Date

2024-06-17

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Natural language inference (NLI) datasets are great resources to train and benchmark models that infer entailment relations. However, these datasets are known to have issues such as lexical biases that affect the behaviour of the models trained on them. In this thesis, we take on this task from an experimentation point of view. We study consequence relations and how data augmentation affects the performance of NLI models. We started by defining the model, which uses a simple LSTM consisting of an embedding layer, and defined three scenarios upon which we synthesize entailment in a controlled manner from within SNLI corpus. We trained various models and compared the performance using the f1-score for entailment and overall accuracy to show how adding synthetic data provides a middle-ground to have balanced performance, particularly for different consequence relations. We found that under the scenarios we defined, self-entailment decreases the f1-score marginally compared to the original data when tested on the baseline model. This is followed by conjunction scenario where the premise is augmented with its hypothesis, and finally, where the hypothesis is augmented with the premise. We conclude by recommending proportions of synthetic data that should be added to these models to make them better at inferring different logical consequence relations.

Description

Keywords

Language Technology

Citation