首页> 外文期刊>Journal of computational science >Optimizing Data-Driven Models for Summarization as Parallel Tasks
【24h】

Optimizing Data-Driven Models for Summarization as Parallel Tasks

机译:优化数据驱动模型,以便汇总为并行任务

获取原文
获取原文并翻译 | 示例
       

摘要

This paper presents tackling of a hard optimization problem of computational linguistics, specifically automatic multi-document text summarization, using grid computing. The main challenge of multi-document summarization is to extract the most relevant and unique information effectively and efficiently from a set of topic-related documents, constrained to a specified length. In the Big Data/Text era, where the information increases exponentially, optimization becomes essential in selection of the most representative sentences for generating the best summaries. Therefore, a data-driven summarization model is proposed and optimized during a run of Differential Evolution (DE).Different DE runs are distributed to a grid in parallel as optimization tasks, seeking high processing throughput despite the demanding complexity of the linguistic model, especially on longer multidocuments where DE improves results given more iterations. Namely, parallelization and the grid enable, running several independent DE runs at same time within fixed real-time budget. Such approach results in improving a Document Understanding Conference (DUC) benchmark recall metric over a previous setting. (C) 2020 Elsevier B.V. All rights reserved.
机译:本文介绍了使用网格计算的计算语言学,特别是自动多文件文本摘要的硬优化问题。多文件摘要的主要挑战是从一组主题相关文档中有效,有效地提取最相关和唯一的信息,约束为指定的长度。在大数据/文本时代,信息呈指数增加,优化在选择最佳摘要中的选择方面是必不可少的。因此,在差分演进的运行期间提出和优化了数据驱动的摘要模型(DE)。多样性DE运行并联分发到网格,尽管语言模型的要求苛刻的复杂性,但仍然可以寻求高处理吞吐量。在更长的多程度上,在那里改善结果给出了更多的迭代。即,并行化和网格使能,在固定的实时预算中同时运行多个独立的DE。这种方法导致改进了一个文献了解会议(DUC)基准测试在前一个设置上的度量标准。 (c)2020 Elsevier B.v.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号