首页> 中文期刊>计算机科学 >基于领域本体的文本分割方法研究

基于领域本体的文本分割方法研究

     

摘要

文本分割在信息检索、摘要生成、问答系统、信息抽取等领域发挥着重要作用.在总结现有的国内外文本分割方法的基础上,提出了一种基于领域本体对文本进行线性分割的方法.该方法利用初始概念自动获取结构化语义概念集合,并根据获取的概念、属性及属性词在文本中出现的频次、位置和关系等因素为段落赋予语义标签,挖掘文本的子主题信息,将拥有相同语义标注信息的段落划分为相同语义段落,实现了文本不同子主题之间的分割.实验结果表明,该方法对于特定领域的文本分割的准确率、召回率以及F值分别达到了85%,90%和88%,分割效果能够满足实际应用需求,并优于现有的无需训练语料的文本分割方法.%Text segmentation plays an important role in information retrieval,abstract generation,question-answering system,information extraction and so on.This paper put forward a new text segmentation method based on domain ontology after analyzing and summarizing existing methods at home and abroad.The method first uses initial concept to automatically obtain structured semantic concepts set,which are then used to affix semantic labels to paragraphs in text based on the frequency of occurrence,position and relationship of concepts and properties.Paragraphs with the same semantic annotation information are grouped into one semantic paragraph,which helps discover the sub-topics information and meanwhile realize topic segmentation for texts.The experimental result shows that the precision,recall and F-measure of this method can achieve 85%,90% and 88% respectively,which performs better than most existing methods and satisfies the real application needs.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号