Data-Driven Morphological Analysis and Disambiguation for Morphologically Rich Languages and Universal Dependencies

机译：数据驱动的形态分析和形态多样性丰富的语言和普遍依赖性的歧义消除

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Parsing texts into universal dependencies (UD) in realistic scenarios requires infrastructure for morphological analysis and disambiguation (MA&D) of typologically different languages as a first tier. MA&D is particularly challenging in morphologically rich languages (MRLs), where the ambiguous space-delimited tokens ought to be disambiguated with respect to their constituent morphemes. Here we present a novel, language-agnostic, framework for MA&D, based on a transition system with two variants, word-based and morpheme-based, and a dedicated transition to mitigate the biases of variable-length morpheme sequences. Our experiments on a Modern Hebrew case study outperform the state of the art, and we show that the morpheme-based MD consistently outperforms our word-based variant. We further illustrate the utility and multilingual coverage of our framework by morphologically analyzing and disambiguating the large set of languages in the UD treebanks.

机译：在实际情况下，将文本解析为通用依赖项（UD）需要第一层的基础，该基础结构用于将类型学上不同的语言进行形态分析和消歧（MA＆D）。 MA＆D在形态丰富的语言（MRL）中尤其具有挑战性，在这些语言中，模棱两可的以空格分隔的标记应在其构成语素方面加以消除。在这里，我们提出了一种新的，与语言无关的MA＆D框架，该框架基于具有两个变体（基于单词和基于词素）的过渡系统，以及专用的过渡来减轻可变长度词素序列的偏差。我们在现代希伯来语案例研究中进行的实验超越了现有技术，并且表明基于词素的MD始终优于基于词的变体。我们通过形态学分析和消除UD树库中大量语言的歧义，进一步说明了我们框架的实用性和多语言覆盖。

著录项

来源
《International conference on computational linguistics》|2016年|337-348|共12页
会议地点
作者
Amir More; Reut Tsarfaty;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Implementing universal dependency, morphology, and multiword expression annotation standards for Turkish language processing [J] . UMUT SULUBACAK, GüL?EN ERY???T Turkish Journal of Electrical Engineering and Computer Sciences . 2018,第3期

机译：实现土耳其语处理的通用依赖项，形态和多词表达注释标准
2. Statistical Morphological Disambiguation for Agglutinative Languages [J] . DILEK Z. HAKKANI-TUER, KEMAL OFLAZER, GOEKHAN TUER Computers and the Humanities . 2002,第4期

机译：胶合语言的统计形态学歧义消除
3. Joint Morphological-Lexical Language Modeling for Processing Morphologically Rich Languages With Application to Dialectal Arabic [J] . Sarikaya R., Afify M., Deng Y., IEEE transactions on audio, speech and language processing . 2008,第7期

机译：形态-词汇联合语言建模，用于处理形态丰富的语言及其在方言阿拉伯语中的应用
4. Data-Driven Morphological Analysis and Disambiguation for Morphologically Rich Languages and Universal Dependencies [C] . Amir More, Reut Tsarfaty International conference on computational linguistics . 2016

机译：数据驱动的形态分析和歧义形态丰富的语言和普遍依赖性
5. Functional universals of tense-aspect-modality morphology in SOV languages. [D] . Boutwell, Richard Lee. 2000

机译：SOV语言中时态方面形态形态的功能通用性。
6. Toward Understanding the Lexical–Morphological Networks of Children With Specific Language Impairment: Analysis of Responses on a Morphological Production Task [O] . Hannah Krimm, Krystal L. Werfel, C. Melanie Schuele -1

机译：理解特定语言障碍儿童的词汇形态网络：对形态生产任务的反应分析
7. Character Composition Model with Convolutional Neural Networks for Dependency Parsing on Morphologically Rich Languages [O] . Yu, Xiang, Vu, Ngoc Thang 2017

机译：基于卷积神经网络的字符组合模型形态丰富语言的依赖性解析

Data-Driven Morphological Analysis and Disambiguation for Morphologically Rich Languages and Universal Dependencies

摘要

著录项

相似文献

相关主题

期刊订阅