首页> 外文期刊>Natural language engineering >Identifying signs of syntactic complexity for rule-based sentence simplification
【24h】

Identifying signs of syntactic complexity for rule-based sentence simplification

机译:识别语法复杂性的标志,以简化基于规则的句子

获取原文
           

摘要

This article presents a new method to automatically simplify English sentences. The approach is designed to reduce the number of compound clauses and nominally bound relative clauses in input sentences. The article provides an overview of a corpus annotated with information about various explicit signs of syntactic complexity and describes the two major components of a sentence simplification method that works by exploiting information on the signs occurring in the sentences of a text. The first component is a sign tagger which automatically classifies signs in accordance with the annotation scheme used to annotate the corpus. The second component is an iterative rule-based sentence transformation tool. Exploiting the sign tagger in conjunction with other NLP components, the sentence transformation tool automatically rewrites long sentences containing compound clauses and nominally bound relative clauses as sequences of shorter single-clause sentences. Evaluation of the different components reveals acceptable performance in rewriting sentences containing compound clauses but less accuracy when rewriting sentences containing nominally bound relative clauses. A detailed error analysis revealed that the major sources of error include inaccurate sign tagging, the relatively limited coverage of the rules used to rewrite sentences, and an inability to discriminate between various subtypes of clause coordination. Despite this, the system performed well in comparison with two baselines. This finding was reinforced by automatic estimations of the readability of system output and by surveys of readers' opinions about the accuracy, accessibility, and meaning of this output.
机译:本文介绍了一种自动简化英语句子的新方法。该方法旨在减少输入句子中复合从句和名义上绑定的相对从句的数量。本文提供了一个语料库的概述,该语料注有关于各种语法复杂性的显式符号的信息,并描述了句子简化方法的两个主要组成部分,该方法通过利用与文本句子中出现的符号有关的信息来工作。第一个组件是符号标记器,它根据用于注释主体的注释方案自动对符号进行分类。第二个组件是基于规则的迭代句子转换工具。句子转换工具与其他NLP组件一起使用符号标记器,可以自动将包含复合子句和名义上绑定的相对子句的长句子重写为较短的单子句的序列。对不同组成部分的评估显示,在重写包含复合从句的句子时,其性能可接受,但在重写包含名义上绑定的相对从句的句子时,准确性较低。详尽的错误分析显示,错误的主要来源包括不正确的符号标记,用于重写句子的规则的覆盖范围相对有限以及无法区分子句协调的各种子类型。尽管如此,该系统与两个基准相比仍表现良好。通过自动估计系统输出的可读性以及对读者对该输出的准确性,可访问性和含义的看法进行的调查,这一发现得到了加强。

著录项

  • 来源
    《Natural language engineering》 |2019年第1期|69-119|共51页
  • 作者单位

    Univ Wolverhampton, Res Inst Informat & Language Proc, Wolverhampton, W Midlands, England;

    Univ Wolverhampton, Res Inst Informat & Language Proc, Wolverhampton, W Midlands, England;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号