首页> 中文期刊>中文信息学报 >一种面向突发事件的文本语料自动标注方法

一种面向突发事件的文本语料自动标注方法

     

摘要

事件语料库是研究语义Web中事件知识的抽取、表示、推理和挖掘的基础和关键技术之一.该文以事件作为文本知识单元,在LTP分析的基础上,用序列模式挖掘算法PrefixSpan从现有的小规模语料库中挖掘事件要素的词性规则等,用同义词词林(扩展版)对触发词表进行了扩充,结合自定义的事件要素词典,采用多遍过滤、逐遍完善的思想提出一种针对大规模突发事件语料库构建的自动标注方法,在实验部分不仅与人工标注做了对比,同时与Stanford CoreNLP NER进行了对比,实验效果理想.%Event-based text corpus is the foundation for the research on detection,representation,reasoning and exploitation of events in the Semantic Web.This paper proposes an automatic-annotation method for event-based texts to construct large-scale emergencies news corpus.Firstly,this paper presents an event structure model as eventbased knowledge unit;Secondly,on the basis of text process by LTP,we apply the PrefixSpan to mine the rules of event elements from small-scale available corpus;Thirdly,by combining a customized dictionary of event elements,the denoters are expanded by Tonyici Cilin (Extended).In the experiment,the automatic annotation method is compared with manual tagging method and Stanford CoreNLP NER,showing that this method can improve the efficiency of event-based text annotation effectively.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号