Show simple item record

dc.contributor.authorVirk, Shafqat
dc.date.accessioned2014-08-19T12:08:55Z
dc.date.available2014-08-19T12:08:55Z
dc.date.issued2014-08-19
dc.identifier.isbn9789162887063
dc.identifier.urihttp://hdl.handle.net/2077/36665
dc.description.abstractCan computers process human languages? During the last fifty years, two main approaches have been used to find an answer to this question: data- driven (i.e. statistics based) and knowledge-driven (i.e. grammar based). The former relies on the availability of a vast amount of electronic linguistic data and the processing capabilities of modern-age computers, while the latter builds on grammatical rules and classical linguistic theories of language. In this thesis, we use mainly the second approach and elucidate the development of computational (”resource”) grammars for six Indo-Iranian languages: Urdu, Hindi, Punjabi, Persian, Sindhi, and Nepali. We explore different lexical and syntactical aspects of these languages and build their resource grammars using the Grammatical Framework (GF) – a type theo- retical grammar formalism tool. We also provide computational evidence of the similarities/differences between Hindi and Urdu, and report a mechanical development of a Hindi resource grammar starting from an Urdu resource grammar. We use a functor style implementation that makes it possible to share the commonalities between the two languages. Our analysis shows that this sharing is possible upto 94% at the syntax level, whereas at the lexical level Hindi and Urdu differed in 18% of the basic words, in 31% of tourist phrases, and in 92% of school mathematics terms. Next, we describe the development of wide-coverage morphological lexicons for some of the Indo-Iranian languages. We use existing linguistic data from different resources (i.e. dictionaries and WordNets) to build uni-sense and multi-sense lexicons. Finally, we demonstrate how we used the reported grammatical and lexical resources to add support for Indo-Iranian languages in a few existing GF application grammars. These include the Phrasebook, the mathematics grammar library, and the Attempto controlled English grammar. Further, we give the experimental results of developing a wide-coverage grammar based arbitrary text translator using these resources. These applications show the importance of such linguistic resources, and open new doors for future re- search on these languages.sv
dc.language.isoengsv
dc.relation.ispartofseriesTechnical report. D (Department of Computer Science and Engineering, Chalmers University of Technology & University of Gothenburg)sv
dc.relation.ispartofseries96sv
dc.subjectGrammatical FrameWorksv
dc.subjectIndo-Iranian Languagessv
dc.subjectResource Grammarssv
dc.titleComputational linguistics resources for Indo-Iranian languagessv
dc.typeTextsv
dc.type.svepdoctoral thesissv
dc.type.degreeDoctor of Philosophysv
dc.gup.originUniversity of Gothenburg. IT Facultysv
dc.gup.departmentDepartment of Computer Science and Engineeringsv
dc.gup.defenceplaceMåndagen den 3 juni 2013, kl 10.00, HC2 Chalmers University of Technologysv
dc.gup.defencedate2013-06-03
dc.gup.dissdb-fakultetITFsv


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record