首页> 外文会议>Conference of the European Chapter of the Association for Computational Linguistics >Identifying Broken Plurals, Irregular Gender, and Rationality in Arabic Text
【24h】

Identifying Broken Plurals, Irregular Gender, and Rationality in Arabic Text

机译:在阿拉伯文中识别破坏复数,不规则性别和合理性

获取原文

摘要

Arabic morphology is complex, partly because of its richness, and partly because of common irregular word forms, such as broken plurals (which resemble singular nouns), and nouns with irregular gender (feminine nouns that look masculine and vice versa). In addition, Arabic morpho-syntactic agreement interacts with the lexical semantic feature of rationality, which has no morphological realization. In this paper, we present a series of experiments on the automatic prediction of the latent linguistic features of functional gender and number, and rationality in Arabic. We compare two techniques, using simple maximum likelihood (MLE) with back-off and a support vector machine based sequence tagger (Yamcha). We study a number of orthographic, morphological and syntactic learning features. Our results show that the MLE technique is preferred for words seen in the training data, while the Yamcha technique is optimal for unseen words, which are our real target. Furthermore, we show that for unseen words, morphological features help beyond orthographic features and that syntactic features help even more. A combination of the two techniques improves overall performance even further.
机译:阿拉伯语形态很复杂,部分是因为它的丰富性,部分是因为常见的不规则词形式,如破碎的复数(类似奇异名词)和具有不规则性别的名词(看起来男性的女性名词,反之亦然)。此外,阿拉伯语句法协议与合理性的词汇性语义特征进行互动,没有形态实现。在本文中,我们提出了一系列关于自动预测功能性别和数量的潜在语言特征的实验,以及阿拉伯语的合理性。我们比较两种技术,使用简单的最大可能性(MLE),带有退避和基于支持向量机的序列标记器(yamcha)。我们研究了许多正交,形态和句法学习功能。我们的研究结果表明,MLE技术是在培训数据中看到的单词的优选,而山茶技术对于看不见的单词是最佳的,这是我们真正的目标。此外,我们表明,对于看不见的词语,形态特征有助于超越正交特征,并且句法特征有助于更多。两种技术的组合进一步提高了整体性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号