首页> 外文期刊>Indian Journal of Science and Technology >Improving Triangle-Graph Based Text Summarization using Hybrid Similarity Function
【24h】

Improving Triangle-Graph Based Text Summarization using Hybrid Similarity Function

机译:使用混合相似度函数改进基于三角图的文本摘要

获取原文
获取外文期刊封面目录资料

摘要

Objective: Extractive Summarization, extracts the most applicable sentences from the main document, while keeping the most vital information in the document. The Graph-based techniques have become very popular for text summarisation. This paper introduces a hybrid graph based technique for single-document extractive summarization. Methods/Statistical Analysis: Prior research that utilised the graph-based approach for extractive summarisation deployed one function for computing the necessary summary. Nonetheless, in our work, we have recommended an innovative hybrid similarity function (H), for estimation purpose. This function hybridises four distinct similarity measures: cosine similarity (sim1), Jaccard similarity (sim2), word alignmentbased similarity (sim3) and the window-based similarity measure (sim4). The method uses a trainable summarizer, which takes into account several features. The effect of these features on the summarization task is investigated. Findings: By combining, the traditional similarity measures (Cosine and Jaccard) with dynamic programming approaches (word alignment-based and the window-based) for calculating the similarity between two sentences, more common information were extracted and helped to find the best sentences to be extracted in the final summary. The proposed method was evaluated using ROUGE measures on the dataset DUC2002. The experimental results showed that specific combinations of features could give higher efficiency. It also showed that some features have more effect than others on the summary creation. Applications/Improvements: The performance of this new method has been tested using the DUC 2002 data set. The effectiveness of this technique is measured using the ROUGE score, and the results are promising when compared with some existing techniques.
机译:目的:摘录摘要,从主文档中提取最适用的句子,同时将最重要的信息保留在文档中。基于图的技术已非常流行用于文本摘要。本文介绍了一种基于混合图的技术,用于单文档提取摘要。方法/统计分析:利用基于图形的方法进行摘要的先前研究采用了一项功能来计算必要的摘要。尽管如此,在我们的工作中,我们还是建议使用创新的混合相似度函数(H)进行估算。此函数将四个不同的相似性度量进行混合:余弦相似性(sim1),雅卡德相似性(sim2),基于单词对齐的相似性(sim3)和基于窗口的相似性度量(sim4)。该方法使用可训练的汇总器,该汇总器考虑了多个功能。研究了这些功能对汇总任务的影响。结果:通过将传统的相似度度量(余弦和雅卡德)与动态编程方法(基于单词对齐和基于窗口的)相结合来计算两个句子之间的相似度,提取了更多的常用信息并帮助找到了最佳句子在最终摘要中提取。在数据集DUC2002上使用ROUGE度量对提出的方法进行了评估。实验结果表明,特征的特定组合可以提供更高的效率。它还表明,某些功能对摘要创建的影响要大于其他功能。应用程序/改进:已使用DUC 2002数据集测试了此新方法的性能。使用ROUGE评分来衡量该技术的有效性,与某些现有技术相比,结果令人鼓舞。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号