Breaking Barriers: Enhancing Universal Dependency Parsing for Amharic Advancing NLP for A Low-Resource Language

No Thumbnail Available

Date

2025-06-19

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

This study advances Amharic dependency parsing by expanding and refining the existing Universal Dependencies (UD) Treebank (Seyoum, Miyao, and Mekonnen, 2018). As a morphologically rich and under-resourced language, Amharic poses unique challenges in natural language processing (NLP), particularly in syntactic and morphological parsing. Leveraging the UD framework and the transformer-based toolkit, Trankit, this work achieves improved parsing accuracy, outperforming the results obtained with UDPipe and Turku models by Seyoum, Miyao, and Mekonnen (2020) across multiple evaluation metrics. This result demonstrates that dataset augmentation, coupled with rigorous syntactic validation, can substantially enhance parsing performance and offer a scalable pathway for NLP development in lowresource languages.

Description

Keywords

Language Technology Keywords: Amharic, Universal Dependencies, Low-Resource Language, Tokenization, Dependency Parsing, Treebank Expansion, Natural Language Processing

Citation