【24h】

Statistical Single-Document Summarization for Chinese News Articles

机译:中文新闻文章的统计单文档摘要

获取原文
获取原文并翻译 | 示例

摘要

Given huge amount of daily news articles, it would be helpful to users if the news reading time can be reduced. In this paper, we focus on single-document summarization for Chinese news articles with statistical methods. First, new vocabularies are collected from news articles, and verified with online translation services. These are included as the auxiliary lexicon. Then, statistical word segmentation is done by calculating the relative frequency of overlapping word n-grams. Finally, the sentence importance is estimated as the weighted sum of n-gram scores, and the top-ranked sentences are selected as the summary. The experimental results showed that generated summaries can be effectively clustered in the same group as the original news articles. A great reduction in storage size can be observed while preserving suitable similarity with the original document. This shows the potential of our proposed approach in news summarization. Further investigation is needed to verify in other document domains.
机译:给定大量的每日新闻文章,如果可以减少新闻阅读时间,将对用户有所帮助。在本文中,我们专注于采用统计方法对中文新闻文章进行单文档摘要。首先,从新闻文章中收集新词汇,并通过在线翻译服务进行验证。这些都包括为辅助词典。然后,通过计算重叠词n-gram的相对频率来完成统计词分割。最后,将句子重要性作为n-gram分数的加权总和进行估计,并选择排名靠前的句子作为摘要。实验结果表明,生成的摘要可以与原始新闻文章有效地聚集在同一组中。在保持与原始文档适当的相似性的同时,可以观察到存储大小的极大减少。这表明我们提出的方法在新闻摘要中的潜力。需要进一步调查以在其他文档域中进行验证。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号