Automatic Text Summarization Based on Word-Clusters and Ranking Algorithms

机译：基于词簇和排名算法的文本自动摘要

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper investigates a new approach for Single Document Summarization based on a Machine Learning ranking algorithm. The use of machine learning techniques for this task allows one to adapt summaries to the user needs and to the corpus characteristics. These desirable properties have motivated an increasing amount of work in this field over the last few years. Most approaches attempt to generate summaries by extracting text-spans (sentences in our case) and adopt the classification framework which consists to train a classifier in order to discriminate between relevant and irrelevant spans of a document. A set of features is first used to produce a vector of scores for each sentence in a given document and a classifier is (rained in order to make a global combination of these scores. We believe that the classification criterion for training a classifier is not adapted for SDS and propose an original framework based on ranking for this task. A ranking algorithm also combines the scores of different features but its criterion tends to reduce the relative misordering of sentences within a document. Features we use here are either based on the state-of-the-art or built upon word-clusters. These clusters are groups of words which often co-occur with each other, and can serve to expand a query or to enrich the representation of the sentences of the documents. We analyze the performance of our ranking algorithm on two data sets - the Computation and Language (cmp_lg) collection of TIPSTER SUMMAC and the WIPO collection. We perform comparisons with different baseline - non learning - systems, and a reference trainable summarizer system based on the classification framework. The experiments show that the learning algorithms perform better than the non-learning systems while the ranking algorithm outperforms the classifier. The difference of performance between the two learning algorithms depends on the nature of datasets. We give an explanation of this fact by the different separability hypothesis of the data made by the two learning algorithms.

机译：本文研究了一种基于机器学习排名算法的单文档摘要化新方法。使用机器学习技术来完成此任务，可以使摘要适应用户需求和语料库特征。在过去的几年中，这些理想的特性激发了该领域越来越多的工作。大多数方法都尝试通过提取文本跨度（在我们的情况下为句子）来生成摘要，并采用分类框架，该框架由训练分类器组成，以便区分文档的相关跨度和无关跨度。首先使用一组功能为给定文档中的每个句子生成分数矢量，并且对分类器进行了训练（以使这些分数成为全局组合。我们认为训练分类器的分类标准不适用为SDS提出了一个基于该任务的排名的原始框架，排名算法也结合了不同特征的得分，但其判据倾向于减少文档中句子的相对错序。这些聚类是经常彼此共同出现的词组，可以用来扩展查询或丰富文档句子的表示形式，我们对性能进行了分析我们对两种数据集的排名算法的评估-TIPSTER SUMMAC的计算和语言（cmp_lg）集合和WIPO集合。我们使用不同的基准线（非学习系统）进行比较，以及基于分类框架的可参考的参考汇总器系统。实验表明，学习算法的性能优于非学习系统，而排序算法的性能优于分类器。两种学习算法之间的性能差异取决于数据集的性质。我们通过两种学习算法得出的数据的不同可分离性假设对此事实进行解释。

著录项

来源
《European Conference on IR Research(ECIR 2005); 20050321-23; Santiago de Compostela(ES)》|2005年|P.142-156|共15页
会议地点 Santiago de Compostela(ES)
作者
Massih R. Amini; Nicolas Usunier; Patrick Gallinari;
展开▼
作者单位

Computer Science Laboratory of Paris 6, 8 Rue du Capitaine Scott, 75015 Paris, France;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类信息处理（信息加工）;
关键词

相似文献

外文文献
中文文献
专利

1. An automatic arabic text summarization system based on genetic algorithms [J] . Imen Tanfouri, Ghassen Tlik, Fethi Jarray Procedia Computer Science . 2021,第a期

机译：基于遗传算法的自动阿拉伯文摘要系统
2. Biogeography-Based Optimization Algorithm for Automatic Extractive Text Summarization [J] . Seyed Hossein MirShojaee, Behrooz Masoumi, Esmaeel Zeinali International Journal of Industrial Engineering & Production Research . 2017,第1期

机译：基于生物地理的自动提取文本摘要优化算法
3. Word-sentence co-ranking for automatic extractive text summarization [J] . Fang Changjian, Mu Dejun, Deng Zhenghong, Expert Systems with Application . 2017,第APRa期

机译：自动提取文本摘要的词句排序
4. Automatic Text Summarization Based on Word-Clusters and Ranking Algorithms [C] . Massih R. Amini, Nicolas Usunier, Patrick Gallinari European Conference on IR Research . 2005

机译：基于词群和排名算法的自动文本摘要
5. Automatic text summarization using lexical chains: Algorithms and experiments. [D] . Kolla, Maheedhar. 2005

机译：使用词法链的自动文本摘要：算法和实验。
6. Movie Review Summarization Using Supervised Learning and Graph-Based Ranking Algorithm [O] . Atif Khan, Muhammad Adnan Gul, Mahdi Zareei, 2020

机译：使用监督学习和基于图的排名算法的电影审查摘要
7. Automatic Text Summarization based on Word-Clusters and Ranking Algorithms [O] . Amini Massih-Reza, Usunier Nicolas, Gallinari Patrick 2005

机译：基于词簇和排名算法的自动文本摘要

Automatic Text Summarization Based on Word-Clusters and Ranking Algorithms

摘要

著录项

相似文献

相关主题

期刊订阅