首页> 外文会议>International natural language generation conference >MinWikiSplit: A Sentence Splitting Corpus with Minimal Propositions
【24h】

MinWikiSplit: A Sentence Splitting Corpus with Minimal Propositions

机译:MinWikiSplit:具有最小命题的句子拆分语料库

获取原文

摘要

We compiled a new sentence splitting corpus that is composed of 203K pairs of aligned complex source and simplified target sentences. Contrary to previously proposed text simplification corpora, which contain only a small number of split examples, we present a dataset where each input sentence is broken down into a set of minimal propositions, i.e. a sequence of sound, self-contained utterances with each of them presenting a minimal semantic unit that cannot be further decomposed into meaningful propositions. This corpus is useful for developing sentence splitting approaches that learn how to transform sentences with a complex linguistic structure into a fine-grained representation of short sentences that present a simple and more regular structure which is easier to process for downstream applications and thus facilitates and improves their performance.
机译:我们编译了一个新的句子拆分语料库,该语料库由203K对对齐的复杂源句子和简化目标句子组成。与先前提出的仅包含少量拆分示例的文本简化语料库相反,我们提供了一个数据集,其中每个输入句子都分解为一组最小命题,即每个发音序列,自成一体的发音提出了一个最小的语义单元,该语义单元无法进一步分解为有意义的命题。该语料库可用于开发句子拆分方法,该方法学习如何将具有复杂语言结构的句子转换为短句子的细粒度表示形式,从而呈现出简单且规则的结构,从而更易于下游应用程序处理,从而促进并改进了他们的表现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号