MEDIASUM: A Large-scale Media Interview Dataset for Dialogue Summarization

机译：MediaSum：用于对话摘要的大规模媒体访谈数据集

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper introduces MEDIASUM, a large-scale media interview dataset consisting of 463.6K transcripts with abstractive summaries. To create this dataset, we collect interview transcripts from NPR and CNN and employ the overview and topic descriptions as summaries. Compared with existing public corpora for dialogue summarization, our dataset is an order of magnitude larger and contains complex multi-party conversations from multiple domains. We conduct statistical analysis to demonstrate the unique positional bias exhibited in the transcripts of televised and radioed interviews. We also show that MEDIASUM can be used in transfer learning to improve a model's performance on other dialogue summarization tasks.

机译：本文介绍了MediaSum，这是一个大型媒体面试数据集，由具有抽象摘要的463.6k抄本组成。要创建此数据集，我们会收集来自NPR和CNN的面试成绩单，并使用概述和主题描述作为摘要。与现有的公共集团进行对话摘要相比，我们的数据集是一个幅度较大的级，并包含来自多个域的复杂多方对话。我们进行统计分析，以展示电视和无线电访谈的成绩单中展出的独特的位置偏见。我们还表明MediaSum可用于转移学习，以改善模型对其他对话摘要任务的性能。

著录项

来源
《Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies》|2021年|5927-5934|共8页
会议地点
作者
Chenguang Zhu; Yang Liu; Jie Mei; Michael Zeng;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. QBSUM: A large-scale query-based document summarization dataset from real-world applications [J] . Mingjun Zhao, Shengli Yan, Bang Liu, Computer speech and language . 2021,第Mara期

机译：qbsum：真实世界应用程序的基于大规模的查询文件摘要数据集
2. Digital Life-Story Narratives as Data for Policy Makers and Practitioners: Thinking Through Methodologies for Large-Scale Multimedia Qualitative Datasets [J] . Nicole Matthews, Naomi Sunderland Journal of broadcasting & electronic media . 2013,第1期

机译：数字生活故事作为决策者和从业者的数据：大型多媒体定性数据集的方法论思考
3. Deep Learning Based Abstractive Text Summarization: Approaches, Datasets, Evaluation Measures, and Challenges [J] . Dima Suleiman, Arafat Awajan Mathematical Problems in Engineering: Theory, Methods and Applications . 2020,第1期

机译：基于深度学习的抽象文本摘要：方法，数据集，评估措施和挑战
4. MedDialog: Large-scale Medical Dialogue Datasets [C] . Guangtao Zeng, Wenmian Yang, Zeqian Ju, Conference on Empirical Methods in Natural Language Processing . 2020

机译：MedDialog：大规模医疗对话数据集
5. Query-Driven Analysis and Visualization for Large-Scale Scientific Dataset using Geometry Summarization and Bitmap Indexing [D] . Wei, Tzu-Hsuan 2017

机译：使用几何汇总和位图索引的大规模科学数据集的查询驱动分析和可视化
6. GeneBase 1.1: a tool to summarize data from NCBI gene datasets and its application to an update of human gene statistics [O] . Allison Piovesan, Maria Caracausi, Francesca Antonaros, 2016

机译：GeneBase 1.1：汇总NCBI基因数据集数据的工具及其在更新人类基因统计中的应用
7. QBSUM: A large-scale query-based document summarization dataset from real-world applications [O] . Mingjun Zhao, Shengli Yan, Bang Liu, 2021

机译：qbsum：真实世界应用程序的基于大规模的查询文件摘要数据集

MEDIASUM: A Large-scale Media Interview Dataset for Dialogue Summarization

摘要

著录项

相似文献

相关主题

期刊订阅