首页> 外文期刊>Natural language engineering >A transformation-driven approach for recognizing textual entailment
【24h】

A transformation-driven approach for recognizing textual entailment

机译:转换驱动的方法来识别文本蕴含

获取原文
获取原文并翻译 | 示例
       

摘要

Textual Entailment is a directional relation between two text fragments. The relation holds whenever the truth of one text fragment, called Hypothesis (H), follows from another text fragment, called Text (T). Up until now, using machine learning approaches for recognizing textual entailment has been hampered by the limited availability of data. We present an approach based on syntactic transformations and machine learning techniques which is designed to fit well with a new type of available data sets that are larger but less complex than data sets used in the past. The transformations are not predefined, but calculated from the data sets, and then used as features in a supervised learning classifier. The method has been evaluated using two data sets: the SICK data set and the EXCITEMENT English data set. While both data sets are of a larger order of magnitude than data sets such as RTE-3, they are also of lower levels of complexity, each in its own way. SICK consists of pairs created by applying a predefined set of syntactic and lexical rules to its T and H pairs, which can be accurately captured by our transformations. The EXCITEMENT English data contains short pieces of text that do not require a high degree of text understanding to be annotated. The resulting AdArte system is simple to understand and implement, but also effective when compared with other existing systems. AdArte has been made freely available with the EXCITEMENT Open Platform, an open source platform for textual inference.
机译:文本蕴含是两个文本片段之间的方向关系。只要一个文本片段(称为“假说”(H))的真相从另一个文本片段(称为“文本”(T))得出真相,关系就成立。迄今为止,由于数据的可用性有限,使用机器学习方法来识别文本的含义一直受到阻碍。我们提出一种基于句法转换和机器学习技术的方法,该方法旨在很好地适应新型可用数据集,该数据集比过去使用的数据集更大,但不那么复杂。转换不是预先定义的,而是根据数据集计算得出的,然后用作监督学习分类器中的功能。该方法已使用两个数据集进行了评估:SICK数据集和EXCITEMENT English数据集。尽管两个数据集的数量级都比RTE-3等数据集大,但它们的复杂性也较低,每种级别都有其自己的方式。 SICK包含通过对T和H对应用预定义的句法和词汇规则集而创建的对,这些对可以通过我们的转换准确地捕获。 EXCITEMENT英文数据包含一些短文本,这些文本不需要对高度的文本理解进行注释。由此产生的AdArte系统易于理解和实施,但与其他现有系统相比也很有效。 AdArte已随EXCITEMENT开放平台免费提供,EXCITEMENT开放平台是用于文本推断的开源平台。

著录项

  • 来源
    《Natural language engineering》 |2017年第4期|507-534|共28页
  • 作者

    ROBERTO ZANOLI; SILVIA COLOMBO;

  • 作者单位

    Human Language Technology, Fondazione Bruno Kessler, 38123 Trento, Italy;

    Edinburgh University School of Informatics, 11 Crichton St, Edinburgh EH8 9LE, UK;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-18 02:08:45

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号