【24h】

Fusion of News Reports Using Surface-Based Methods

机译:使用基于表面的方法融合新闻报道

获取原文
获取原文并翻译 | 示例

摘要

Events occurring in the real world are covered by news reports from different sources. Each report generally contains information that is found in others, but may also contain unique information. To learn all the information about a particular event, a user will need to read all the different reports. This is a duplication of effort since most information will be repeated in the different reports. In our research, we attempt to fuse news reports about the same event into a single coherent document eliminating repetition but preserving all the information contained in the source reports using only surface-based methods. Information in each news report is represented by a set of entity relationship graphs. The graphs representing each report are then merged into a single graph whilst keeping track of the source sentences. The fused report is generated using the maximally expressive set of sentences -- the sentences that carry most information about the entities and their relationships in the news report, and ensuring that all entities and relationships are expressed in the fused document. Our Document fusion system was evaluated using a set of news reports downloaded from MSNBC News that cite their sources, and also using human evaluation. We show that our system is able to capture most of the information found across different source documents whilst maintaining readability.
机译:来自不同来源的新闻报道涵盖了现实世界中发生的事件。每个报告通常包含在其他报告中找到的信息,但也可能包含唯一信息。要了解有关特定事件的所有信息,用户将需要阅读所有不同的报告。这是重复的工作,因为大多数信息将在不同的报告中重复。在我们的研究中,我们尝试将有关同一事件的新闻报道融合到一个连贯的文档中,以消除重复,但仅使用基于表面的方法来保留源报告中包含的所有信息。每个新闻报道中的信息由一组实体关系图表示。然后,代表每个报告的图形将合并为一个图形,同时跟踪源语句。融合的报告是使用句子的最大表达量生成的-这些句子在新闻报道中包含有关实体及其关系的大多数信息,并确保所有实体和关系都在融合文档中表达。我们的文档融合系统使用从MSNBC News下载的一组引用其来源的新闻报道进行了评估,还使用了人工评估。我们表明,我们的系统能够捕获在不同源文档中找到的大多数信息,同时保持可读性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号