Semantic Analysis for Improved Multi-document Summarization of Text.

机译：改进的多文档文本摘要的语义分析。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Excess amount of unstructured data is easily accessible in digital format. This information overload places too heavy a burden on society for its analysis and execution needs. Focused (i.e. topic, query, question, category, etc.) multi-document summarization is an information reduction solution which has reached a state-of-the-art that now demands the need to further explore other techniques to model human summarization activity. Such techniques have been mainly extractive and rely on distribution and complex machine learning on corpora in order to perform closely to human summaries. Overall, these techniques are still being used, and the field now needs to move toward more abstractive approaches to model human way of summarizing. A simple, inexpensive and domain-independent system architecture is created for adding semantic analysis to the summarization process. The proposed system is novel in its use of a new semantic analysis metric to better score sentences for selection into a summary. It also simplifies semantic processing of sentences to better capture more likely semantic-related information, reduce redundancy and reduce complexity. The system is evaluated against participants in the Document Understanding Conference and the later Text Analysis Conference using the performance ROUGE measures of n-gram recall between automated systems, human and baseline gold standard baseline summaries. The goal was to show that semantic analysis used for summarization can perform well, while remaining simple and inexpensive without significant loss of recall as compared to the foundational baseline system. Current results show improvement over the gold standard baseline when all factors of this work's semantic analysis technique are used in combination. These factors are the semantic cue words feature and semantic class weighting to determine sentences with important information. Also, the semantic triples clustering used to decompose natural language sentences to their most basic meaning and select the most important sentences added to this improvement. In competition against the gold standard baseline system on the standardized summarization evaluation metric ROUGE, this work outperforms the baseline system by more than ten position rankings. This work shows that semantic analysis and light-weight, open-domain techniques have potential.

机译：大量的非结构化数据很容易以数字格式访问。这种信息过载给社会带来了沉重的负担，无法满足其分析和执行需求。重点突出的（即主题，查询，问题，类别等）多文档摘要是一种信息缩减解决方案，已达到最新水平，现在需要进一步探索其他技术来模拟人类摘要活动。此类技术主要是提取性的，并且依赖于语料库上的分布和复杂的机器学习，以便与人类的摘要紧密结合。总体而言，这些技术仍在使用，并且该领域现在需要朝着更抽象的方法发展，以模拟人类的总结方式。创建了一种简单，廉价且独立于域的系统体系结构，以将语义分析添加到摘要过程中。所提出的系统是新颖的，其使用新的语义分析度量来更好地评分句子以供选择为摘要。它还简化了句子的语义处理，以更好地捕获与语义相关的信息，减少冗余并降低复杂性。针对文档理解会议和以后的文本分析会议的参与者，使用自动系统之间的n-gram召回性能ROUGE度量，人员和基准金标准基准摘要对系统进行了评估。目的是表明用于汇总的语义分析可以很好地执行，同时与基础基线系统相比，在保持简单和廉价的同时又不会显着降低召回率。当前的结果表明，当结合使用这项工作的语义分析技术的所有因素时，其对金本位制基准的改进。这些因素是语义提示词的特征和确定具有重要信息的句子的语义类加权。同样，语义三元组聚类用于将自然语言句子分解为它们的最基本含义，并选择添加到此改进中的最重要的句子。在标准化摘要评估指标ROUGE上与黄金标准基准系统竞争时，这项工作在十多个排名上均优于基准系统。这项工作表明语义分析和轻量级的开放域技术具有潜力。

著录项

作者
Israel, Quinsulon L.;
展开▼
作者单位

Drexel University.;

展开▼
授予单位 Drexel University.;
学科 Computer science.;Artificial intelligence.;Information Technology.
学位 Ph.D.
年度 2014
页码 135 p.
总页数 135
原文格式 PDF
正文语种 eng
中图分类
关键词
入库时间 2022-08-17 11:53:33

相似文献

外文文献
中文文献
专利

1. FoDoSu: Multi-document summarization exploiting semantic analysis based on social Folksonomy [J] . Jee-Uk Heu, Iqbal Qasim, Dong-Ho Lee Information Processing & Management . 2015,第1期

机译：FoDoSu：基于社会Folksonomy的多文档摘要利用语义分析
2. Abstractive Multi-Document Summarization Based on Semantic Link Network [J] . Li Wei, Zhuge Hai IEEE Transactions on Knowledge and Data Engineering . 2021,第1期

机译：基于语义链接网络的抽象多文件摘要
3. A semantic approach to extractive multi-document summarization: Applying sentence expansion for tuning of conceptual densities [J] . Mohammad Bidoki, Mohammad R. Moosavi, Mostafa Fakhrahmad Information Processing & Management . 2020,第6期

机译：提取多文件摘要的语义方法：应用句子扩张调整概念密度
4. Using Syntactic and Shallow Semantic Kernels to Improve Multi-Modality Manifold-Ranking for Topic-Focused Multi-Document Summarization [C] . Yllias Chali, Sadid A. Hasan, Kaisar Imam IJCNLP 2011 . 2011

机译：使用句法和浅语义内核来改善以主题的多文件摘要为主题的多模态歧管排名
5. Multi-document summarization based on atomic semantic events and their temporal relations [D] . Uddin, Md Mohsin 2015

机译：基于原子语义事件及其时间关系的多文档摘要
6. Functional Module Search in Protein Networks based on Semantic Similarity Improves the Analysis of Proteomics Data [O] . Desislava Boyanova, Santosh Nilla, Gunnar W. Klau, 2014

机译：基于语义相似性的蛋白质网络功能模块搜索改进了蛋白质组学数据的分析
7. Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization [O] . Dingding Wang, Tao Li, Shenghuo Zhu, 2014

机译：通过句子级语义分析和对称矩阵分解进行多文档摘要

Semantic Analysis for Improved Multi-document Summarization of Text.

摘要

著录项

相似文献

相关主题

期刊订阅