首页> 外文会议>International Conference on Engineering and Applied Technology >Named entity recognition model for Indonesian tweet using CRF classifier
【24h】

Named entity recognition model for Indonesian tweet using CRF classifier

机译:使用CRF分类器命名为印度尼西亚推文的实体识别模型

获取原文

摘要

Named Entity Recognition (NER) is a part of Natural Language Processing (NLP) that acts to recognize the existing word entity in the document. By using NER, it is possible to perform activities such as information extraction and text summary. One of the data sources for the NLP process is tweets which are real time, occurred frequently, but limited by the number of words per tweet. In Indonesia, twitter is one of the most popular social media with various topics, so, it is necessary to provide models, train data, and test data for Indonesian tweet. In this study, the models were built using Conditional Random Field classification from 8,000 tweets that have been grouped to formal tweets and informal tweets. By testing the models to 2,000 training data, it provided recall and precision results of 62% and 87% respectively for formal tweets, 36% and 90% respectively for informal tweets, and 60% and 86% respectively for mixed tweets. These results indicate that the created Indonesian tweet models can be used for automatic NER.
机译:命名实体识别(NER)是其作用是识别文档中的现有实体词自然语言处理(NLP)的一部分。通过使用NER,能够执行的活动,如信息提取和文本摘要。一个为NLP过程数据源是鸣叫其是实时的,频繁发生,但通过每鸣叫的字的数量的限制。在印度尼西亚,Twitter正在与各种主题最流行的社交媒体之一,因此,有必要对印尼鸣叫提供模型,训练数据和测试数据。在这项研究中,模特们使用条件随机场分类从已分组到微博正式和非正式的鸣叫8000个鸣叫建。通过测试模型2000的训练数据,它用于混合鸣叫分别设置为62%和87%的召回和精度的结果分别为正式鸣叫,36%和分别用于非正式鸣叫90%,和60%和86%。这些结果表明,所创建的印尼鸣叫模型可以用于自动ER。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号