首页> 外文会议>International Research Conference on Smart Computing and Systems Engineering >Part of speech tagging for Twitter conversations using Conditional Random Fields model
【24h】

Part of speech tagging for Twitter conversations using Conditional Random Fields model

机译:使用条件随机字段模型对Twitter对话进行语音标记的一部分

获取原文

摘要

Part-of-Speech Tagging is the technology of assigning the appropriate parts-of-speech to a word. Part-of-speech tagging is very useful in information retrieval, information extraction, and speech processing. This research presents a part-of-speech tagging, especially for twitter text data. The process of part-of-speech tagging for twitter conversation is a difficult task. Several approaches have been made to develop an accurate tagging system but most of them are relevant to news text data and web contents. Therefore, this research intends to develop a part-of- speech tagger model for twitter speech. using CRF toolkit. The system was developed for nearly 1000 twitter conversations employing Conditional Random Field stochastic model. The data for twitter speech was downloaded from the internet. A POS-tagged text corpus, template file and CoNLL file for both training and testing database were prepared accordingly. The training was carried out for both the unigram model as well as the bigram model. The performance of the system over these models was obtained through this examination which showed a significant efficiency, calculated from the number of correctly tagged words and the total number of words.
机译:词性标记是将适当的词性分配给单词的技术。词性标记在信息检索,信息提取和语音处理中非常有用。这项研究提出了词性标记,特别是对于Twitter文本数据。 Twitter会话的词性标记过程是一项艰巨的任务。已经开发了几种方法来开发精确的标记系统,但是大多数方法与新闻文本数据和Web内容有关。因此,本研究旨在为Twitter语音开发词性标记器模型。使用CRF工具包。该系统使用条件随机场随机模型开发,用于近1000个Twitter会话。 Twitter语音数据已从互联网上下载。相应地准备了用于培训和测试数据库的带有POS标签的文本语料库,模板文件和CoNLL文件。对unigram模型和bigram模型都进行了训练。通过这次检查获得了系统在这些模型上的性能,根据正确标记的单词数和单词总数计算得出的效率很高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号