首页> 外文期刊>Computer speech and language >Microblog summarization using Paragraph Vector and semantic structure
【24h】

Microblog summarization using Paragraph Vector and semantic structure

机译:使用段向量和语义结构的微博摘要

获取原文
获取原文并翻译 | 示例

摘要

There are two fundamental difficulties that are still hindering the development of microblog summarization. The first problem is the features sparseness of microblog, which restricts the performance of sub-topics detection. The second one is the sentence selection from sub-topics that is based mainly on centrality approaches to measure sentence salience. Also, the semantic features and relations features between sentences and sub-topics were not given much attention. In order to address the two aforementioned problems, we propose a summarization method considering Paragraph Vector and semantic structure. Firstly, we construct sentence similarity matrix that involves the contextual information of microblogs to detect sub-topics by using Paragraph Vector. Secondly, we analyze the sentences by utilizing Chinese Sentential Semantic Model (CSM) to get semantic features; then the relations features are obtained based on the similarity matrix and semantic features above. Finally, the most informative sentences can be selected accurately from microblogs belonging to the same sub-topics by semantic features and relation features. The experimental results show that the ROUGE-1 value is up to 53.17% with 1.5% compression ratio. The results indicate that applying Paragraph Vector to the field of microblog summarization can effectively improve sub-topics detection. Additionally, semantic features and relation features enhance summarization result jointly. Furthermore, CSM provides a promising tool for sentence semantic analysis. (C) 2019 Elsevier Ltd. All rights reserved.
机译:有两种根本困难仍在妨碍微博汇总的发展。第一个问题是微博的功能稀疏,它限制了子主题检测的性能。第二个是来自次主题的句子选择,主要是基于衡量句子的句子的近似途径。此外,句子和子主题之间的语义特征和关系的特征没有很多关注。为了解决两个上述问题,我们提出了一种考虑段落向量和语义结构的总结方法。首先,我们构建句子相似性矩阵,其涉及微博通过使用段落向量来检测子主题的微博的上下文信息。其次,我们通过利用中文句子语义模型(CSM)来分析句子来获得语义特征;然后基于上面的相似性矩阵和语义特征获得关系特征。最后,通过语义特征和关系特征,可以从属于同一子主题的微博进行精确选择最具信息句。实验结果表明,胭脂-1值高达53.17%,压缩比为1.5%。结果表明,将段落向量应用于微博摘要领域,可以有效地改善子主题检测。此外,语义特征和关系功能共同增强摘要结果。此外,CSM提供了一个有希望的句子语义分析的工具。 (c)2019 Elsevier Ltd.保留所有权利。

著录项

  • 来源
    《Computer speech and language》 |2019年第9期|1-19|共19页
  • 作者单位

    Beijing Inst Technol Sch Informat & Elect Lab Informat Secur & Countermeasures Technol 5 Zhongguancun Nan St Beijing 100081 Peoples R China;

    Beijing Inst Technol Sch Informat & Elect Lab Informat Secur & Countermeasures Technol 5 Zhongguancun Nan St Beijing 100081 Peoples R China;

    Beijing Inst Technol Sch Informat & Elect Lab Informat Secur & Countermeasures Technol 5 Zhongguancun Nan St Beijing 100081 Peoples R China;

    Beijing Inst Technol Sch Informat & Elect Lab Informat Secur & Countermeasures Technol 5 Zhongguancun Nan St Beijing 100081 Peoples R China;

    Beijing Inst Technol Sch Informat & Elect Lab Informat Secur & Countermeasures Technol 5 Zhongguancun Nan St Beijing 100081 Peoples R China;

    Beijing Inst Technol Sch Informat & Elect Lab Informat Secur & Countermeasures Technol 5 Zhongguancun Nan St Beijing 100081 Peoples R China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Chinese Sentential Semantic Model; Deep learning; Language models; Language parsing and understanding; Microblog summarization;

    机译:中国句子语义模型;深入学习;语言模型;语言解析和理解;微博汇总;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号