A HMM POS Tagger for Micro-blogging Type Texts

机译：用于微博客文字的HMM POS标记

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The high volume of communication via micro-blogging type messages has created an increased demand for text processing tools customised the unstructured text genre. The available text processing tools developed on structured texts has been shown to deteriorate significantly when used on unstructured, micro-blogging type texts. In this paper, we present the results of testing a HMM based POS (Part-Of-Speech) tagging model customized for unstructured texts. We also evaluated the tagger against published CRF based state-of-the-art POS tagging models customized for Tweet messages using three publicly available Tweet corpora. Finally, we did cross-validation tests with both the taggers by training them on one Tweet corpus and testing them on another one. The results show that the CRF-based POS tagger from GATE performed approximately 8% better compared to the HMM (Hidden Markov Model) model at token level, however at the sentence level the performances were approximately the same. The cross-validation experiments showed that both tagger's results deteriorated by approximately 25% at the token level and a massive 80% at the sentence level. A detailed analysis of this deterioration is presented and the HMM trained model including the data has also been made available for research purposes. Since HMM training is orders of magnitude faster compared to CRF training, we conclude that the HMM model, despite trailing by about 8% for token accuracy, is still a viable alternative for real time applications which demand rapid as well as progressive learning.

机译：通过微博客类型的消息进行的大量通信已导致对定制非结构化文本类型的文本处理工具的需求增加。已显示，在非结构化微博客类型的文本上使用时，在结构化文本上开发的可用文本处理工具会大大恶化。在本文中，我们介绍了针对非结构化文本定制的基于HMM的POS（词性）标记模型的测试结果。我们还使用三个公开的Tweet语料库，针对已发布的基于CRF的，针对Tweet消息定制的最新POS标记模型，对标记器进行了评估。最后，我们通过在一个Tweet语料库上对它们进行训练，并在另一个Tweet语料上对其进行了测试，从而对这两个标记器进行了交叉验证测试。结果表明，在令牌级别，来自GATE的基于CRF的POS标记器的性能比HMM（隐马尔可夫模型）模型好大约8％，但是在句子级别，性能大致相同。交叉验证实验表明，两个标记程序的结果在标记级别上均下降了约25％，在句子级别上下降了80％。给出了对这种恶化的详细分析，包括数据在内的HMM训练模型也已用于研究目的。由于HMM训练比CRF训练快几个数量级，我们得出的结论是，尽管HMM模型在令牌准确性方面落后约8％，但对于需要快速学习和渐进学习的实时应用而言，HMM模型仍然是可行的选择。

著录项

来源
《Pacific Rim international conference on artificial intelligence》|2014年|157-169|共13页
会议地点
作者
Parma Nand; Rivindu Perera; Ramesh Lal;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. An on-line free handwritten Chinese character recognition method based on component cascaded HMMs [J] . 高技术通讯（英文版） . 2005,第003期
2. Handwritten Arabic text recognition using multi-stage sub-core-shape HMMs [J] . Ahmad Irfan, Fink Gernot A. International Journal on Document Analysis and Recognition . 2019,第3期

机译：使用多阶段子核心形状HMM的手写阿拉伯文本识别
3. Morpho-Syntactic Tagging of Text in “Baoule” Language Based on Hidden Markov Models (HMM) [J] . Hyacinthe Konan, Bi Tra Gooré, Raymond Gbégbé, Journal of Software Engineering and Applications . 2016,第10期

机译：基于隐马尔可夫模型（HMM）的“ Baoule”语言的文本句法句法标记
4. Granular transfer learning using type-2 fuzzy HMM for text sequence recognition [J] . Sun Shichang, Yun Jian, Lin Hongfei, Neurocomputing . 2016,第nova19期

机译：使用类型2模糊HMM进行文本序列识别的粒度转移学习
5. A HMM POS Tagger for Micro-blogging Type Texts [C] . Parma Nand, Rivindu Perera, Ramesh Lal Pacific Rim International Conference on Artificial Intelligence . 2014

机译：用于微博型文本的HMM POS标记器
6. The expression of temporality in the written discourse of L2 learners of English: Distinguishing text-types and text passages. [D] . Ewert, Doreen Elizabeth. 2006

机译：第二语言学习者的书面语篇中的时间性表达：区分文字类型和文字段落。
7. Synthesis Characterization and Use of Mesoporous Silicas of the Following Types SBA-1 SBA-2 HMM-1 and HMM-2 [O] . Sylwia Jarmolińska, Agnieszka Feliczak-Guzik, Izabela Nowak 2020

机译：中孔三种SBA-1SBA-2HMM-1和HMM-2的介孔二氧化硅的合成表征和使用
8. A HMM POS Tagger for Micro-blogging Type Texts [O] . Nand P, Perera R, Lal R 2014

机译：用于微博类型文本的Hmm pOs Tagger

A HMM POS Tagger for Micro-blogging Type Texts

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅