Determining Extractive Summary for a Single Document Based on Collaborative Filtering Frequency Prediction and Mean Shift Clustering

Ahmed M. El-Refaiy; Ahmed R. Abas; Ibrahim M. El-Henawy

首页> 外文期刊>IAENG Internaitonal journal of computer science >Determining Extractive Summary for a Single Document Based on Collaborative Filtering Frequency Prediction and Mean Shift Clustering

【24h】

Determining Extractive Summary for a Single Document Based on Collaborative Filtering Frequency Prediction and Mean Shift Clustering

机译：基于协同滤波频率预测和均值漂移聚类的单个文档提取摘要确定

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents a new unsupervised algorithm for determining extractive summary for a single document using term frequency prediction, which is obtained from memory-based collaborative filtering (CF) approach, and Mean Shift Clustering algorithm. The new algorithm uses Term-Sentence Collaborative Filtering (TSCF) for predicting term frequency. These term frequencies are used in sentence ranking according to the presence percentage of each word/term in each sentence. TSCF computes term frequencies for either terms present or missing (sparse) in a sentence via collaborative filtering prediction algorithm. The new algorithm uses Mean Shift Clustering algorithm as a final framework to group sentences according to their ranks to get more coherent summaries. Experiments show the effect of using different weighting functions including: Term Frequency (TF), Term Frequency Inverse Document Frequency (TFIDF) and binary TF. In addition, they show the effect of using different distance metrics that support sparse matrices representations including: Cosine, Euclidean and Manhattan. Experiments also, show the effect of using L1 and L2 normalization. ROUGE is used as a fully automatic metric in text summarization on DUC2002 datasets. Results show ROUGE-1, ROUGE-2, ROUGE-L and ROUGE-SU4 average recall, precision and f-measure scores, which show the effectiveness of the new algorithm. Results show that the proposed TSCF algorithm has promising results and outperforms related baseline techniques in many ROUGE scores.

机译：本文提出了一种新的无监督算法，该算法使用术语频率预测来确定单个文档的提取摘要，该算法是从基于内存的协同过滤（CF）方法和均值漂移聚类算法中获得的。新算法使用术语句子协同过滤（TSCF）来预测术语频率。根据每个单词/术语在每个句子中的存在百分比，将这些术语频率用于句子排名。 TSCF通过协作过滤预测算法计算句子中存在或缺失（稀疏）的术语的词频。新算法使用均值漂移聚类算法作为最终框架，根据句子的等级对句子进行分组，以获得更连贯的摘要。实验显示了使用不同加权函数的效果，这些函数包括：词频（TF），词频逆文档频率（TFIDF）和二进制TF。此外，它们还显示了使用支持稀疏矩阵表示的不同距离度量（包括余弦，欧几里得和曼哈顿）的效果。实验还显示了使用L1和L2归一化的效果。在DUC2002数据集的文本摘要中，ROUGE用作全自动度量。结果显示ROUGE-1，ROUGE-2，ROUGE-L和ROUGE-SU4的平均召回率，精度和f测度得分，表明了该算法的有效性。结果表明，所提出的TSCF算法在许多ROUGE评分中均具有令人满意的结果，并且优于相关的基线技术。

著录项

来源
《IAENG Internaitonal journal of computer science》 |2019年第3期|494-505|共12页
作者
Ahmed M. El-Refaiy; Ahmed R. Abas; Ibrahim M. El-Henawy;
展开▼
作者单位

Department of Computer Science, Faculty of Computers and Informatics, Zagazig University, 44519, Egypt;

Department of Computer Science, Faculty of Computers and Informatics, Zagazig University, 44519, Egypt;

Department of Computer Science, Faculty of Computers and Informatics, Zagazig University, 44519, Egypt;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Extractive Text Summarization; Collaborative Filtering Prediction; Term frequency; Information retrieval; Mean Shift Clustering;

机译：提取文本摘要;协同过滤预测;术语频率;信息检索;平均移位聚类;

相似文献

外文文献
中文文献
专利

1. Determining Extractive Summary for a Single Document Based on Collaborative Filtering Frequency Prediction and Mean Shift Clustering [J] . Ahmed M. El-Refaiy, Ahmed R. Abas, Ibrahim M. El-Henawy IAENG Internaitonal journal of computer science . 2019,第3期

机译：基于协同滤波频率预测和平均移位聚类的单个文档确定提取概述
2. A collaborative filtering-based approach to personalized document clustering [J] . Chih-Ping Wei, Chin-Sheng Yang, Han-Wei Hsiao Decision support systems . 2008,第3期

机译：基于协作过滤的个性化文档聚类方法
3. Extraction Based Multi Document Summarization using Single Document Summary Cluster [J] . Shanmugasundaram Hariharan International Journal of Advances in Soft Computing and Its Applications . 2010,第1期

机译：使用单文档摘要集群的基于提取的多文档摘要
4. Extracting Multi-document Summaries with a Double Clustering Approach [C] . Sara Botelho Silveira, Antonio Branco Natural language processing and information systems. . 2012

机译：使用双聚类方法提取多文档摘要
5. Enhancing Collaborative Filtering-Based Rating-Prediction by Discovering and Incorporating User Concerns from User Reviews [D] . Pradhan, Ligaj. 2017

机译：通过发现和纳入用户评论中的用户关注点来增强基于协作过滤的评分预测
6. A novel collaborative filtering model for LncRNA-disease association prediction based on the Naïve Bayesian classifier [O] . Jingwen Yu, Zhanwei Xuan, Xiang Feng, 2019

机译：基于朴素贝叶斯分类器的LncRNA-疾病关联预测的新型协同过滤模型
7. ACC/AHA/ESC guidelines for the management of patients with atrial fibrillation31This document was approved by the American College of Cardiology Board of Trustees in August 2001, the American Heart Association Science Advisory and Coordinating Committee in August 2001, and the European Society of Cardiology Board and Committee for Practice Guidelines and Policy Conferences in August 2001.32When citing this document, the American College of Cardiology, the American Heart Association, and the European Society of Cardiology would appreciate the following citation format: Fuster V, Rydén LE, Asinger RW, Cannom DS, Crijns HJ, Frye RL, Halperin JL, Kay GN, Klein WW, Lévy S, McNamara RL, Prystowsky EN, Wann LS, Wyse DG. ACC/AHA/ESC guidelines for the management of patients with atrial fibrillation: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines and the European Society of Cardiology Committee for Practice Guidelines and Policy Conferences (Committee to Develop Guidelines for the Management of Patients With Atrial Fibrillation). J Am Coll Cardiol 2001;38:XX-XX.33This document is available on the World Wide Web sites of the American College of Cardiology (www.acc.org), the American Heart Association (www.americanheart.org), the European Society of Cardiology (www.escardio.org), and the North American Society of Pacing and Electrophysiology (www.naspe.org). Single reprints of this document (the complete Guidelines) to be published in the mid-October issue of the European Heart Journal are available by calling +44.207.424.4200 or +44.207.424.4389, faxing +44.207.424.4433, or writing Harcourt Publishers Ltd, European Heart Journal, ESC Guidelines – Reprints, 32 Jamestown Road, London, NW1 7BY, United Kingdom. Single reprints of the shorter version (Executive Summary and Summary of Recommendations) published in the October issue of the Journal of the American College of Cardiology and the October issue of Circulation, are available for $5.00 each by calling 800-253-4636 (US only) or by writing the Resource Center, American College of Cardiology, 9111 Old Georgetown Road, Bethesda, Maryland 20814. To purchase bulk reprints specify version and reprint number (Executive Summary 71-0208; full text 71-0209) up to 999 copies, call 800-611-6083 (US only) or fax 413-665-2671; 1000 or more copies, call 214-706-1466, fax 214-691-6342; or E-mail: pubauth@heart.org. A report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines and the European Society of Cardiology Committee for Practice Guidelines and Policy Conferences (Committee to Develop Guidelines for the Management of Patients With Atrial Fibrillation) Developed in Collaboration With the North American Society of Pacing and Electrophysiology [O] . Fuster Valentin, Rydén Lars E., Asinger Richard W., 2001

机译：ACC / AHA / ESC治疗房颤患者指南31该文件于2001年8月获得美国心脏病学会董事会，2001年8月美国心脏协会科学咨询与协调委员会以及欧洲心脏病学会的批准以及实践指南和政策委员会会议（2001年8月）。32引用本文件时，美国心脏病学会，美国心脏协会和欧洲心脏病学会将赞赏以下引用格式：Fuster V，RydénLE，Asinger RW，Cannom DS，Crijns HJ，Frye RL，Halperin JL，Kay GN，Klein WW，LévyS，McNamara RL，Prystowsky EN，Wann LS，Wyse DG。 ACC / AHA / ESC治疗房颤患者的指南：美国心脏病学会/美国心脏协会实践指南工作组和欧洲心脏病学会实践指南委员会和政策会议的报告（制定指南委员会）用于房颤患者的治疗）。 J Am Coll Cardiol 2001; 38：XX-XX.33本文件可在美国心脏病学会（www.acc.org），美国心脏协会（www.americanheart.org），欧洲的万维网站点上找到心脏病学会（www.escardio.org）和北美起搏和电生理学会（www.naspe.org）。可致电+44.207.424.4200或+44.207.424.4389，传真+44.207.424.4433或写信给Harcourt Publishers，以获取本文档（完整的准则）的单份重印本（完整的准则），该印刷本将于10月中旬出版。欧洲心脏杂志，ESC指南–转载，英国伦敦詹姆斯敦路32号，NW1 7BY。短版（执行摘要和建议摘要）的单版重印在《美国心脏病学会杂志》十月刊和《循环》十月刊上，致电800-253-4636（仅美国），每本售价5.00美元。）或写信给美国心脏病学院资源中心，地址是：马里兰州贝塞斯达市Old Georgetown Road 9111，邮编20814。要购买批量转载，请指定版本和转载编号（执行摘要71-0208；全文71-0209），最多999份，致电800-611-6083（仅限美国）或传真413-665-2671； 1000或更多副本，请致电214-706-1466，传真214-691-6342;或电子邮件：pubauth@heart.org。美国心脏病学会/美国心脏协会实践指南工作组和欧洲心脏病学会实践指南和政策会议（制定房颤患者治疗指南委员会）的报告是与北方合作开发的美国起搏与电生理学会
8. Prediction Of Lumen Output And Chromaticity Shift In Leds Using Kalman Filter And Extended Kalman Filter Based Models. [R] . Lall, P., Wei, J., Davis, J. L. 2014

机译：基于卡尔曼滤波和扩展卡尔曼滤波的模型预测Lmen输出和色度偏移。

Determining Extractive Summary for a Single Document Based on Collaborative Filtering Frequency Prediction and Mean Shift Clustering

摘要

著录项

相似文献

相关主题

期刊订阅