Multilingual text generation from structured formal representations
Sammanfattning
This thesis aims to identify the optimal ways in which natural language generation techniques
can be brought to bear upon the problem of processing a structured body of information in order to devise a coherent presentation of text content in multiple languages. We investigate how chains of referential expressions are realized in English, Swedish and Hebrew, and suggest several coreference strategies that can be used to generate coherent descriptions
about paintings. The suggested strategies focus on the need to produce paragraphsized
written natural language descriptions from formal structured representations presented in
the Semantic Web. We account for principles of coreference by introducing a new modularized approach to automatically generate chains of referential expressions from ontologies. We demonstrate the feasibility of the approach by implementing a system where a Semantic Web domain ontology serves as the background knowledge representation and where the language-specific coreference strategies are incorporated. The system uses both the principles of discourse structures and coreference strategies to guide the generation process. We show how the system successfully
generates coherent, well-formed descriptions in multiple languages.
Delarbeten
[1] Dannélls, Dana 2008a. A system architecture for conveying historical knowledge to museum visitors. Workshop on Information Access to Cultural Heritage (IACH), Lecture Notes in Computer Science. Berlin: Springer. [2] Dannélls, Dana 2008b. Generating tailored texts for museum exhibits. The 2nd workshop on language technology for cultural heritage (LaTeCH 2008), 17--20. Marrakech: ELRA. [3] Dannélls, Dana 2009. The value of weights in automatically generated text structures. Proceedings of the 10th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing), Lecture Notes in Computer Science, LNCS 5449, 233--244. Berlin: Springer. [4] Dannélls, Dana 2010a. Discourse generation from formal specifications using the Grammatical Framework, GF. Special issue of the journal Research in Computing Science 46: 167--178. [5] Dannélls, Dana 2008c. The production of documents from ontologies. Proceedings of the 18th european conference on artificial intelligence (ECAI), 36--38. Patras: IOS Press. [6] Dannélls Dana, Mariana Damova, Ramona Enache and Milen Chechev 2011. A framework for improved access to museum databases in the Semantic Web. Proceedings of Language Technologies for Digital Humanities and Cultural Heritage Workshop, 3--10. [7] Dannélls, Dana 2010b. Applying semantic frame theory to automate natural language templates generation from ontology statements. Proceedings of the 6th International Natural Language Generation Conference (INLG 2010), 179--184. Dublin: ACL. [8] Dannélls, Dana and Lars Borin 2012. Toward language independent methodology for generating artwork descriptions -- exploring framenet information. EACL workshop on Language Technology for Cultural Heritage,
Social Sciences, and Humanities (LaTeCH), 18--23. Avignon: ACL. [9] Dannélls, Dana, Ramona Enache, Damova Mariana and Milen Chechev 2012. Multilingual online generation from semantic web ontologies. Proceedings of the World Wide Web conference (WWW2012) European project track, 239--242. [10] Dannélls, Dana 2012b. On generating coherent multilingual descriptions of museum objects from semantic web ontologies. Proceedings of the Seventh International Natural Language Generation Conference (INLG
2012), 76--84. Utica, IL: ACL.
Examinationsnivå
Doctor of Philosophy
Universitet
Göteborgs universitet. Humanistiska fakulteten
University of Gothenburg. Faculty of Arts
Institution
Department of Swedish ; Institutionen för svenska språket
Disputation
Tisdagen den 5 februari 2013, kl. 10.15, Lilla hörsalen, Humanisten, Renströmsgatan 6
Datum för disputation
2013-02-05
E-post
dana.dannells@svenska.gu.se
Övrig beskrivning
Supervisor: Lars Borin, University of Gothenburg
Opponent: Michael Elhadad, Ben-Gurion University of the Negev
Datum
2013-01-11Författare
Dannélls, Dana
Nyckelord
computational linguistics
language technology
natural language processing
multilingual natural language generation
coherence
coreference
ontology
semantic web
Publikationstyp
Doctoral thesis
ISBN
978-91-87850-48-6
ISSN
0347-948X
Serie/rapportnr.
Data linguistica
23
Språk
eng