【24h】

Beyond N in N-gram Tagging

机译:在n-gram标记中没有n

获取原文

摘要

The Hidden Markov Model (HMM) for part-of-speech (POS) tagging is typically based on tag trigrams. As such it models local context but not global context, leaving long-distance syntactic relations unrepresented. Using n-gram models for n > 3 in order to incorporate global context is problematic as the tag sequences corresponding to higher order models will become increasingly rare in training data, leading to incorrect estimations of their probabilities. The trigram HMM can be extended with global contextual information, without making the model infeasible, by incorporating the context separately from the POS tags. The new information incorporated in the model is acquired through the use of a wide-coverage parser. The model is trained and tested on Dutch text from two different sources, showing an increase in tagging accuracy compared to tagging using the standard model.
机译:用于语音部分(POS)标记的隐马尔可夫模型(HMM)通常基于Tag Tragrams。 因此,它模拟了本地背景但不是全球背景,远程句法关系不足。 使用N> 3的N-GRAM模型以合并全球背景是有问题的,因为对应于更高阶模型的标签序列将在训练数据中越来越少见,导致其概率不正确。 通过将上下文分开地与POS标签将上下文结合到,可以使用全局上下文信息扩展三支迁移嗯,而不会使模型不可行。 通过使用宽覆盖解析器获取包含在模型中的新信息。 与两个不同源的荷兰语文本进行培训并测试模型,与使用标准模型的标记相比,标记精度的增加。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号