Analyzing Tagging Accuracy of Part-of-Speech Taggers

机译：分析词语术语标签的标记精度

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Automated part-of-speech (POS) tagging has been a very active research area for many years and is the foundation of natural language processing systems. Natural Language Toolkit (NLTK) library in the Python environment provides the necessary tools for tagging, but doesn't actually tell us what methods work the best. Therefore, this work analyzes the performance of part-of-speech taggers, namely the NLTK Default tagger, Regex tagger and N-gram taggers (Unigram, Bigram and Trigram) on a particular corpus. The corpora we have used for the analysis are; Brown, Penn Treebank and CoNLL2000. We have applied all taggers to these three corpora, resultantly we have shown that whereas Unigram tagger does the best tagging in all corpora, the combination of taggers does better if it is correctly ordered.

机译：自动化部分语音（POS）标记已经是多年来的一个非常活跃的研究区，是自然语言处理系统的基础。 Python环境中的自然语言工具包（NLTK）库提供了用于标记的必要工具，但实际上并没有告诉我们哪些方法最佳工作。因此，这项工作分析了语音部分标签的性能，即NLTK默认标记器，正则表达式标记和N-GRAM标记（UNIGRAM，BIGRAM和TRIGRAM）上的特定语料库。我们用于分析的Corpora是; 棕色，宾夕法尼亚州班克和康普拉彭2000。我们已经将所有标记器应用于这三个语料库，结果我们已经表明，虽然Uniagram标签在所有Corpora中做了最佳标记，但如果正确订购，标签器的组合会更好地表现更好。

著录项

来源
《International Conference on Genetic and Evolutionary Computing》|2016年|xvii 470p.|共8页
会议地点
作者
Nyein Pyae; Pyae Khin; Than Nwe Aung;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311-532;
关键词
POS taggers; Brown corpus; Penn Treebank Corpus; CoNLL2000 corpus;

机译：POS Taggers;Brown语料库;Penn TreeBank语料库;Conll2000语料库;

相似文献

外文文献
中文文献
专利

1. Tagging Accuracy Analysis on Part-of-Speech Taggers [J] . Semih Yumusak, Erdogan Dogdu, Halife Kodaz Journal of Computer and Communications . 2014,第4期

机译：词性标注器的标注准确性分析
2. Improving accuracy of Part-of-Speech (POS) tagging using hidden markov model and morphological analysis for Myanmar Language [J] . Dim Lam Cing, Khin Mar Soe International Journal of Electrical and Computer Engineering . 2020,第2期

机译：使用隐马尔可夫模型和缅甸语言的形态分析提高语音部分（POS）标记的准确性
3. Improving Part-of-Speech Tagging Accuracy for Croatian by Morphological Analysis [J] . ?. Agi?, Z. Dovedan, M. Tadi? Informatica: An International Journal of Computing and Informatics . 2009,第2期

机译：通过形态分析提高克罗地亚语的词性标注精度
4. Analyzing Tagging Accuracy of Part-of-Speech Taggers [C] . Nyein Pyae, Pyae Khin, Than Nwe Aung International Conference on Genetic and Evolutionary Computing . 2016

机译：分析词语术语标签的标记精度
5. IITagger: Tagging Wall Street Journal text with part-of-speech information [D] . Kim, Yeongkwun 1996

机译：IITagger：使用词性信息标记“华尔街日报”文本
6. A fine-grained Chinese word segmentation and part-of-speech tagging corpus for clinical text [O] . Ying Xiong, Zhongmin Wang, Dehuan Jiang, 2019

机译：用于临床文本的细粒度中文分词和词性标注语料库
7. The Theoretical Argument for Disproving Asymptotic Upper-Bounds on the Accuracy of Part-of-Speech Tagging Algorithms: Adopting a Linguistics, Rule-Based Approach [O] . Foley William 2016

机译：关于词性标注算法准确性的渐近上限的理论论证：采用基于规则的语言学方法

Analyzing Tagging Accuracy of Part-of-Speech Taggers

摘要

著录项

相似文献

相关主题

期刊订阅