Morphological Segmentation and Part-of-Speech Tagging for the Arabic Heritage

Mohamed Emad

首页> 外文期刊>ACM transactions on Asian language information processing >Morphological Segmentation and Part-of-Speech Tagging for the Arabic Heritage

【24h】

Morphological Segmentation and Part-of-Speech Tagging for the Arabic Heritage

机译：阿拉伯文的形态学分割和词性标注

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We annotate 60,000 words of Classical Arabic (CA) with topics in philosophy, religion, literature, and law with fine-grain segment-based morphological descriptions. We use these annotations for building a morphological segmenter and part-of-speech (POS) tagger for CA. With character-level classification and features from the word and its lexical context, the segmenter achieves a word accuracy of 96.8% with the main issue being a high rate of out-of-vocabulary words. A token-based POS tagger achieves an accuracy of 96.22% with 97.72% on known tokens despite the small size of the corpus. An error analysis shows that most of the tagging errors are results of segmentation and that quality improves with more data being added. The morphological segmenter and tagger have a wide range of potential applications in processing CA, a low-resource variety of the language.

机译：我们用基于细粒度段的形态学描述注释了60,000个单词的古典阿拉伯语（CA），主题涉及哲学，宗教，文学和法律。我们使用这些注释来为CA构建形态学分段器和词性（POS）标记器。借助字符级别的分类和单词及其词法上下文的特征，该分割器可实现96.8％的单词准确度，主要问题是词汇外单词的比率很高。尽管语料库很小，但基于令牌的POS标记器仍可实现96.22％的准确度，已知令牌的准确率达97.72％。错误分析表明，大多数标记错误是分段的结果，并且随着添加更多数据，质量得以提高。形态学分段器和标记器在处理CA（一种语言的资源较少）方面具有广泛的潜在应用。

著录项

来源
《ACM transactions on Asian language information processing》 |2018年第3期|22.1-22.13|共13页
作者
Mohamed Emad;
展开▼
作者单位

Indiana Univ, Dept Linguist, 1020 E Kirkwood Ave, Bloomington, IN 47405 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Arabic; part-of-speech tagging; morphological analysis; segmentation; heritage;

机译：阿拉伯语;词性标注;形态分析;分段;遗传;
入库时间 2022-08-18 04:03:45

相似文献

外文文献
中文文献
专利

1. Part-of-speech tagging for Arabic tweets using CRF and Bi-LSTM [J] . Wasan AlKhwiter, Nora Al-Twairesh Computer speech and language . 2021,第Jana期

机译：使用CRF和BI-LSTM的阿拉伯语推文的词典标记
2. Building Arabic Corpus Applied to Part-of-Speech Tagging [J] . Rabab Ali Abumalloh, Hassan Maudi Al-Sarhan, Waheeb Abu-Ulbeh Indian Journal of Science and Technology . 2016,第46期

机译：构建应用于词性标记的阿拉伯语语料库
3. Levenberg-Marquardt Learning Neural Network For Part-of-Speech Tagging of Arabic Sentences [J] . HASAN MUAIDI WSEAS Transactions on Computers . 2014,第Pta1期

机译：Levenberg-Marquardt学习神经网络用于阿拉伯句的词性标注
4. Improving Arabic Part-of-Speech Tagging through Morphological Analysis [C] . Mohammed Albared, Nazlia Omar, Mohd. Juzaiddin Ab Aziz International conference on intelligent information and database systems;ACIIDS 2011 . 2011

机译：通过形态分析改进阿拉伯语词性标注
5. Toward language-independent morphological segmentation and part-of-speech induction. [D] . Dasgupta, Sajib. 2007

机译：走向独立于语言的形态学分割和词性诱导。
6. A fine-grained Chinese word segmentation and part-of-speech tagging corpus for clinical text [O] . Ying Xiong, Zhongmin Wang, Dehuan Jiang, 2019

机译：用于临床文本的细粒度中文分词和词性标注语料库
7. Challenging Language-Dependent Segmentation for Arabic: An Application to Machine Translation and Part-of-Speech Tagging [O] . Sajjad, Hassan, Dalvi, Fahim, Durrani, Nadir, 2017

机译：挑战阿拉伯语的语言依赖分割：一种应用机器翻译和词性标注

Morphological Segmentation and Part-of-Speech Tagging for the Arabic Heritage

摘要

著录项

相似文献

相关主题

期刊订阅