首页> 中文期刊> 《计算机工程与应用》 >主题与子事件发现的多文档自动文摘

主题与子事件发现的多文档自动文摘

         

摘要

A multi-document summarization method based on topic and sub-event is proposed.The method extracts eight basic word features using the frequency,position information,word of event and topic information etc.which break through traditional statistical method,then chooses logistic regression model to compute words score.The summarizer gives a score to sentences in term of the word values, and combines score and redundancy of sentence to produce summarization.It uses three different summary systems(Coverage Baseline,Centroid-Based Summary and Word Mining based Summary(WMS)) in three aspects(N-gram co-occurrence statistics,term word coverage and high frequency word) to compare.The experimental results show the system of WMS has more effectiveness and feasibility.%提出了一种基于主题与子事件抽取和多文档自动文摘方法.该方法突破传统词频统计方法,除考虑词语频率、位置信息外,还将词语是否为描述文本集合的主题和子事件作为因素,提取出了8个基本特征,利用逻辑回归模型基本特征对词语的影响,计算词语权重.通过建立句子向量空间模型给句子打分,结合句子分数和冗余度产生文摘.对N-gram同现频率、主题词覆盖率和高频词覆盖率3种不同参数,分别在Coverage Baseline、Centroid-Based Summary和Word Mining based Summary(WMS)3种不同文摘系统下所产生的文摘质量,进行了对比实验,结果表明WMS系统在多方面具有优越的性能.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号