GenWiki: A Dataset of 1.3 Million Content-Sharing Text and Graphs for Unsupervised Graph-to-Text Generation

机译：Genwiki：无监督图形到文本生成的130万内容共享文本和图形的数据集

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data collection for the knowledge graph-to-text generation is expensive. As a result, research on unsupervised models has emerged as an active field recently. However, most unsupervised models have to use non-parallel versions of existing small supervised datasets, which largely constrain their potential. In this paper, we propose a large-scale, general-domain dataset, GenWiki. Our unsupervised dataset has 1.3M text and graph examples, respectively. With a human-annotated test set, we provide this new benchmark dataset for future research on unsupervised text generation from knowledge graphs.

机译：知识图形到文本生成的数据收集昂贵。因此，最近对无监督模型的研究已成为一个活跃的领域。然而，大多数无监督的模型必须使用现有的小型监督数据集的非并行版本，这在很大程度上限制了它们的潜力。在本文中，我们提出了大规模的一般域数据集Genwiki。我们无监督的数据集分别具有1.3M文本和图形示例。通过人类注释的测试集，我们提供了这个新的基准数据集，用于未来关于知识图中未经监督的文本生成的研究。

著录项

来源
《International Conference on Computational Linguistics》|2020年|2398-2409|共12页
会议地点
作者
Zhijing Jin; Qipeng Guo; Xipeng Qiu; Zheng Zhang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Unsupervised Text Topic-Related Gene Extraction for Large Unbalanced Datasets [J] . Jing-Ming Li, Jing-Tao Sun, Wen-Han Huang, Technical Gazette . 2020,第3期

机译：无监督的文本主题相关基因提取大型不平衡数据集
2. Application of the Kohonen map analysis (KMA) on chromatographic datasets to achieve unsupervised classification of olive and non-olive oil samples: a novel approach [J] . Kumar Keshav Analytical methods . 2017,第45期

机译：Kohonen地图分析（KMA）在色谱数据集上实现橄榄和非橄榄油样品的无预测分类：一种新方法
3. Crowdsourced dataset to study the generation and impact of text highlighting in classification tasks [J] . Jorge Ramírez, Marcos Baez, Fabio Casati, BMC research notes . 2019,第1期

机译：众包数据集以研究分类任务中文本突出显示的生成和影响
4. Stage-wise Fine-tuning for Graph-to-Text Generation [C] . Qingyun Wang, Semih Yavuz, Xi Victoria Lin, Annual Meeting of the Association for Computational Linguistics;International Joint Conference on Natural Language Processing . 2021

机译：用于图形到文本生成的阶段明智的微调
5. Unsupervised Binary Code Learning for Approximate Nearest Neighbor Search in Large-scale Datasets. [D] . Zhang, Hao. 2016

机译：大规模数据集中近似邻居搜索的无监督二进制代码学习。
6. Text Snippets to Corroborate Medical Relations: An Unsupervised Approach using a Knowledge Graph and Embeddings [O] . Maulik R. Kamdar, Craig E. Stanley, Michael Carroll, 2020

机译：文本片段以证实医学关系：使用知识图和嵌入的无监督方法
7. Parameterized Generation of Labeled Datasets for Text Categorization Based on a Hierarchical Directory [O] . Dmitry Davidov, Evgeniy Gabrilovich, Shaul Markovitch 2004

机译：基于分层目录的文本分类标签数据集的参数化生成
8. Using Unsupervised Link Discovery Methods to Find Interesting Facts and Connections in a Bibliography Dataset [R] . Lin, S. , Chalupsky, H. 2003

机译：使用无监督链接发现方法在参考书目数据集中查找有趣的事实和连接

GenWiki: A Dataset of 1.3 Million Content-Sharing Text and Graphs for Unsupervised Graph-to-Text Generation

摘要

著录项

相似文献

相关主题

期刊订阅