首页> 外文会议>International joint conference on natural language processing >Earlier Isn't Always Better: Sub-aspect Analysis on Corpus and System Biases in Summarization
【24h】

Earlier Isn't Always Better: Sub-aspect Analysis on Corpus and System Biases in Summarization

机译:早些时候并不总是更好:摘要中的语料库和系统偏差分析

获取原文

摘要

Despite the recent developments on neural summarization systems, the underlying logic behind the improvements from the systems and its corpus-dependency remains largely unexplored. Position of sentences in the original text, for example, is a well known bias for news summarization. Following in the spirit of the claim that summarization is a combination of sub-functions, we define three sub-aspects of summarization: position, importance, and diversity and conduct an extensive analysis of the biases of each sub-aspect with respect to the domain of nine different summarization coipora (e.g., news, academic papers, meeting minutes, movie script, books, posts). We find that while position exhibits substantial bias in news articles, this is not the case, for example, with academic papers and meeting minutes. Furthermore, our empirical study shows that different types of summarization systems (e.g., neural-based) are composed of different degrees of the sub-aspects. Our study provides useful lessons regarding consideration of underlying sub-aspects when collecting a new summarization dataset or developing a new system.
机译:尽管最近关于神经摘要系统的发展,但系统及其语料库依赖性的改进背后的潜在逻辑仍然很大程度上是未开发的。例如,原始文本中的句子的位置是新闻摘要的知名偏见。遵循索赔的精神,摘要是子函数的组合,我们定义了三个摘要的子方面:位置,重要性和多样性,并对每个子方面的偏差进行广泛分析域九家不同摘要乔普拉(例如,新闻,学术论文,会议分钟,电影脚本,书籍,帖子)。我们发现,虽然在新闻文章中展示了大量偏见的虽然,但例如,学术论文和会议纪要的情况并非如此。此外,我们的实证研究表明,不同类型的摘要系统(例如,神经基)由不同程度的子方面组成。我们的研究提供了有关在收集新摘要数据集或开发新系统时审议底层子方面的有用教训。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号