首页> 外文期刊>ACM transactions on Asian language information processing >Morphological Segmentation and Part-of-Speech Tagging for the Arabic Heritage
【24h】

Morphological Segmentation and Part-of-Speech Tagging for the Arabic Heritage

机译:阿拉伯文的形态学分割和词性标注

获取原文
获取原文并翻译 | 示例
       

摘要

We annotate 60,000 words of Classical Arabic (CA) with topics in philosophy, religion, literature, and law with fine-grain segment-based morphological descriptions. We use these annotations for building a morphological segmenter and part-of-speech (POS) tagger for CA. With character-level classification and features from the word and its lexical context, the segmenter achieves a word accuracy of 96.8% with the main issue being a high rate of out-of-vocabulary words. A token-based POS tagger achieves an accuracy of 96.22% with 97.72% on known tokens despite the small size of the corpus. An error analysis shows that most of the tagging errors are results of segmentation and that quality improves with more data being added. The morphological segmenter and tagger have a wide range of potential applications in processing CA, a low-resource variety of the language.
机译:我们用基于细粒度段的形态学描述注释了60,000个单词的古典阿拉伯语(CA),主题涉及哲学,宗教,文学和法律。我们使用这些注释来为CA构建形态学分段器和词性(POS)标记器。借助字符级别的分类和单词及其词法上下文的特征,该分割器可实现96.8%的单词准确度,主要问题是词汇外单词的比率很高。尽管语料库很小,但基于令牌的POS标记器仍可实现96.22%的准确度,已知令牌的准确率达97.72%。错误分析表明,大多数标记错误是分段的结果,并且随着添加更多数据,质量得以提高。形态学分段器和标记器在处理CA(一种语言的资源较少)方面具有广泛的潜在应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号