...
首页> 外文期刊>IAENG Internaitonal journal of computer science >Determining Extractive Summary for a Single Document Based on Collaborative Filtering Frequency Prediction and Mean Shift Clustering
【24h】

Determining Extractive Summary for a Single Document Based on Collaborative Filtering Frequency Prediction and Mean Shift Clustering

机译:基于协同滤波频率预测和均值漂移聚类的单个文档提取摘要确定

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

This paper presents a new unsupervised algorithm for determining extractive summary for a single document using term frequency prediction, which is obtained from memory-based collaborative filtering (CF) approach, and Mean Shift Clustering algorithm. The new algorithm uses Term-Sentence Collaborative Filtering (TSCF) for predicting term frequency. These term frequencies are used in sentence ranking according to the presence percentage of each word/term in each sentence. TSCF computes term frequencies for either terms present or missing (sparse) in a sentence via collaborative filtering prediction algorithm. The new algorithm uses Mean Shift Clustering algorithm as a final framework to group sentences according to their ranks to get more coherent summaries. Experiments show the effect of using different weighting functions including: Term Frequency (TF), Term Frequency Inverse Document Frequency (TFIDF) and binary TF. In addition, they show the effect of using different distance metrics that support sparse matrices representations including: Cosine, Euclidean and Manhattan. Experiments also, show the effect of using L1 and L2 normalization. ROUGE is used as a fully automatic metric in text summarization on DUC2002 datasets. Results show ROUGE-1, ROUGE-2, ROUGE-L and ROUGE-SU4 average recall, precision and f-measure scores, which show the effectiveness of the new algorithm. Results show that the proposed TSCF algorithm has promising results and outperforms related baseline techniques in many ROUGE scores.
机译:本文提出了一种新的无监督算法,该算法使用术语频率预测来确定单个文档的提取摘要,该算法是从基于内存的协同过滤(CF)方法和均值漂移聚类算法中获得的。新算法使用术语句子协同过滤(TSCF)来预测术语频率。根据每个单词/术语在每个句子中的存在百分比,将这些术语频率用于句子排名。 TSCF通过协作过滤预测算法计算句子中存在或缺失(稀疏)的术语的词频。新算法使用均值漂移聚类算法作为最终框架,根据句子的等级对句子进行分组,以获得更连贯的摘要。实验显示了使用不同加权函数的效果,这些函数包括:词频(TF),词频逆文档频率(TFIDF)和二进制TF。此外,它们还显示了使用支持稀疏矩阵表示的不同距离度量(包括余弦,欧几里得和曼哈顿)的效果。实验还显示了使用L1和L2归一化的效果。在DUC2002数据集的文本摘要中,ROUGE用作全自动度量。结果显示ROUGE-1,ROUGE-2,ROUGE-L和ROUGE-SU4的平均召回率,精度和f测度得分,表明了该算法的有效性。结果表明,所提出的TSCF算法在许多ROUGE评分中均具有令人满意的结果,并且优于相关的基线技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号