首页> 外文会议>International Conference on Computational Linguistics >A Framework for Identifying Textual Redundancy
【24h】

A Framework for Identifying Textual Redundancy

机译:识别文本冗余的框架

获取原文

摘要

The task of identifying redundant information in documents that are generated from multiple sources provides a significant challenge for summarization and QA systems. Traditional clustering techniques detect redundancy at the sentential level and do not guarantee the preservation of all information within the document. We discuss an algorithm that generates a novel graph-based representation for a document and then utilizes a set cover approximation algorithm to remove redundant text from it. Our experiments show that this approach offers a significant performance advantage over clustering when evaluated over an annotated dataset.
机译:在多个源生成的文档中识别冗余信息的任务为摘要和QA系统提供了重大挑战。传统聚类技术在句子级别检测冗余,并保证保存文档中的所有信息。我们讨论一种为文档生成基于图形的基于图形的算法,然后利用SET封面近似算法从中删除冗余文本。我们的实验表明,在通过注释的数据集进行评估时,这种方法在聚类时提供了显着的性能优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号