...
首页> 外文期刊>IAENG Internaitonal journal of computer science >Determining Extractive Summary for a Single Document Based on Collaborative Filtering Frequency Prediction and Mean Shift Clustering
【24h】

Determining Extractive Summary for a Single Document Based on Collaborative Filtering Frequency Prediction and Mean Shift Clustering

机译:基于协同滤波频率预测和平均移位聚类的单个文档确定提取概述

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

This paper presents a new unsupervised algorithm for determining extractive summary for a single document using term frequency prediction, which is obtained from memory-based collaborative filtering (CF) approach, and Mean Shift Clustering algorithm. The new algorithm uses Term-Sentence Collaborative Filtering (TSCF) for predicting term frequency. These term frequencies are used in sentence ranking according to the presence percentage of each word/term in each sentence. TSCF computes term frequencies for either terms present or missing (sparse) in a sentence via collaborative filtering prediction algorithm. The new algorithm uses Mean Shift Clustering algorithm as a final framework to group sentences according to their ranks to get more coherent summaries. Experiments show the effect of using different weighting functions including: Term Frequency (TF), Term Frequency Inverse Document Frequency (TFIDF) and binary TF. In addition, they show the effect of using different distance metrics that support sparse matrices representations including: Cosine, Euclidean and Manhattan. Experiments also, show the effect of using L1 and L2 normalization. ROUGE is used as a fully automatic metric in text summarization on DUC2002 datasets. Results show ROUGE-1, ROUGE-2, ROUGE-L and ROUGE-SU4 average recall, precision and f-measure scores, which show the effectiveness of the new algorithm. Results show that the proposed TSCF algorithm has promising results and outperforms related baseline techniques in many ROUGE scores.
机译:本文介绍了一种新的无监督算法,用于使用术语频率预测确定单个文档的提取概要,这是从基于存储器的协同滤波(CF)方法,以及均值移位聚类算法。新算法使用术语句子协同滤波(TSCF)来预测术语频率。这些术语频率根据每个句子中每个单词/术语的存在百分比句子排名。 TSCF通过协作滤波预测算法在句子中计算或丢失(稀疏)的术语频率计算。新算法使用平均移位聚类算法作为根据其排名进行分组句子的最终框架,以获得更多连贯的摘要。实验表明使用不同加权函数的效果,包括:术语频率(TF),术语频率逆文档频率(TFIDF)和二进制TF。此外,它们表明使用支持稀疏矩阵表示的不同距离度量的效果,包括:余弦,欧几里德和曼哈顿。实验还表明使用L1和L2标准化的效果。胭脂用作DUC2002数据集的文本摘要中的全自动度量标准。结果显示Rouge-1,Rouge-2,Rouge-L和Rouge-Su4平均召回,精度和F测量分数,表明了新算法的有效性。结果表明,所提出的TSCF算法在许多胭脂评分中具有有前途的结果和优于相关的基线技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号