Beyond N in N-gram Tagging

机译：在n-gram标记中没有n

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The Hidden Markov Model (HMM) for part-of-speech (POS) tagging is typically based on tag trigrams. As such it models local context but not global context, leaving long-distance syntactic relations unrepresented. Using n-gram models for n > 3 in order to incorporate global context is problematic as the tag sequences corresponding to higher order models will become increasingly rare in training data, leading to incorrect estimations of their probabilities. The trigram HMM can be extended with global contextual information, without making the model infeasible, by incorporating the context separately from the POS tags. The new information incorporated in the model is acquired through the use of a wide-coverage parser. The model is trained and tested on Dutch text from two different sources, showing an increase in tagging accuracy compared to tagging using the standard model.

机译：用于语音部分（POS）标记的隐马尔可夫模型（HMM）通常基于Tag Tragrams。因此，它模拟了本地背景但不是全球背景，远程句法关系不足。使用N> 3的N-GRAM模型以合并全球背景是有问题的，因为对应于更高阶模型的标签序列将在训练数据中越来越少见，导致其概率不正确。通过将上下文分开地与POS标签将上下文结合到，可以使用全局上下文信息扩展三支迁移嗯，而不会使模型不可行。通过使用宽覆盖解析器获取包含在模型中的新信息。与两个不同源的荷兰语文本进行培训并测试模型，与使用标准模型的标记相比，标记精度的增加。

著录项

来源
《Association for Computational Linguistics Annual Meeting》|2004年||共6页
会议地点
作者
Robbert Prins; Association for Computational Linguistics(ACL)(US);
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算机软件;
关键词

相似文献

外文文献
中文文献
专利

1. Using of n-grams from morphological tags for fake news classification [J] . Jozef Kapusta, Martin Drlik, Michal Munk PeerJ Computer Science . 2021,第a期

机译：从形态标签中使用N-GRAM用于假新闻分类
2. Automatic Genre Classification via N-grams of Part-of-Speech Tags [J] . Xiaoyan Tang, Jing Cao Procedia - Social and Behavioral Sciences . 2015,第2期

机译：通过N-gram词性标签自动分类
3. Google N-Gram Viewer does not Include Arabic Corpus! Towards N-Gram Viewer for Arabic Corpus [J] . Alsmadi Izzat, Zarour Mohammad The international arab journal of information technology . 2018,第5期

机译：Google N-Gram Viewer不包括阿拉伯语语料库！面向N-Gram阿拉伯语语料库查看器
4. Indonesian Graphemic Syllabification Using n-Gram Tagger with State-Elimination [C] . Rezza Nafi Ismail, Suyanto Suyanto International Conference on Information and Communication Technology . 2020

机译：使用带有状态消除功能的n-Gram Tagger进行印度尼西亚音素化
5. A Channel Capacity Based Attack to Quantify the Security of N-Gram Based Anomaly Detection Approaches [D] . Shanahan, Nicholas. 2017

机译：基于信道容量的攻击，以量化N-Gram基异常检测方法的安全性
6. How the world’s collective attention is being paid to a pandemic: COVID-19 related n-gram time series for 24 languages on Twitter [O] . Thayer Alshaabi, Michael V. Arnold, Joshua R. Minot, 2021

机译：如何向大流行病人的集体注意力：Covid-19相关的N-Gram时间序列在Twitter上进行24种语言
7. Comparison of different POS Tagging Techniques (N-Gram, HMM and Brill’s tagger) for Bangla [O] . Fahim Muhammad Hasan, Naushad UzZaman, Mumit Khan 2010

机译：孟加拉的不同POS标记技术（N-Gram，HMM和Brill的标记器）的比较

Beyond N in N-gram Tagging

摘要

著录项

相似文献

相关主题

期刊订阅