首页> 外文期刊>International journal of software engineering and knowledge engineering >Document Summarization Using Sentence-Level Semantic Based on Word Embeddings
【24h】

Document Summarization Using Sentence-Level Semantic Based on Word Embeddings

机译:基于词嵌入的句子级语义的文档摘要

获取原文
获取原文并翻译 | 示例
       

摘要

In the era of information overload, text summarization has become a focus of attention in a number of diverse fields such as, question answering systems, intelligence analysis, news recommendation systems, search results in web search engines, and so on. A good document representation is the key point in any successful summarizer. Learning this representation becomes a very active research in natural language processing field (NLP). Traditional approaches mostly fail to deliver a good representation. Word embedding has proved an excellent performance in learning the representation. In this paper, a modified BM25 with Word Embeddings are used to build the sentence vectors from word vectors. The entire document is represented as a set of sentence vectors. Then, the similarity between every pair of sentence vectors is computed. After that, TextRank, a graph-based model, is used to rank the sentences. The summary is generated by picking the top-ranked sentences according to the compression rate. Two well-known datasets, DUC2002 and DUC2004, are used to evaluate the models. The experimental results show that the proposed models perform comprehensively better compared to the state-of-the-art methods.
机译:在信息过载的时代,文本摘要已成为许多不同领域的关注焦点,例如问题解答系统,情报分析,新闻推荐系统,Web搜索引擎中的搜索结果等。良好的文档表示能力是任何成功的摘要程序的关键。学习这种表示形式成为自然语言处理领域(NLP)的一项非常活跃的研究。传统方法大多无法提供良好的代表性。单词嵌入已被证明在学习表示中表现出色。在本文中,使用带有词嵌入的改进BM25从词向量构建句子向量。整个文档表示为一组句子向量。然后,计算每对句子向量之间的相似度。之后,基于图的模型TextRank用于对句子进行排名。通过根据压缩率选择排名靠前的句子来生成摘要。使用两个著名的数据集DUC2002和DUC2004评估模型。实验结果表明,与最新方法相比,所提出的模型具有更好的综合性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号