首页> 外文OA文献 >Data-oriented models of parsing and translation
【2h】

Data-oriented models of parsing and translation

机译:面向数据的解析和翻译模型

摘要

The merits of combining the positive elements of the rule-based and data-driven approaches to MT are clear: a combined model has the potential to be highly accurate, robust, cost-effective to build and adaptable. While the merits are clear, however, how best to combine these techniques into a model which retains the positive characteristics of each approach, while inheriting as few of the disadvantages as possible, remains an unsolved problem. One possible solution to this challenge is the Data-Oriented Translation (DOT) model originally proposed by Poutsma (1998, 2000, 2003), which is based on Data-Oriented Parsing (DOP) (e.g. (Bod, 1992; Bod et al., 2003)) and combines examples, linguistic information and a statistical translation model.ududIn this thesis, we seek to establish how the DOT model of translation relates to the other main MT methodologies currently in use. We find that this model differs from other hybrid models of MT in that it inextricably interweaves the philosophies of the rule-based, example-based and statistical approaches in an integrated framework.ududAlthough DOT embodies many positive characteristics on a theoretical level, it also inherits the computational complexity associated with DOP. Previous experiments assessing the performance of the DOT model of translation were small in scale and the training data used was not ideally suited to the task (Poutsma, 2000, 2003). However, the algorithmic limitations of the DOT implementation used to perform these experiments prevented a more informative assessment from being carried out. In this thesis, we look to the innovative solutions developed to meet the challenges of implementing the DOP model, and investigate their application to DOT. This investigation culminates in the development of a DOT system; this system allows us to perform translation experiments which are on a larger scale and incorporate greater translational complexity than heretofore. Our evaluation indicates that the positive characteristics of the model identified on a theoretical level are also in evidence when it is subjected to empirical assessment. For example, in terms of exact match accuracy, the DOT model outperforms an SMT model trained and tested on the same data by up to 89.73%.ududThe DOP and DOT models for which we provide empirical evaluations assume contextfree phrase-structure tree representations. However, such models can also be developed for more sophisticated linguistic formalisms. In this thesis, we also focus on the efforts which have been made to integrate the representations of Lexical-Functional Grammar (LFG) with DOP and DOT. We investigate the usefulness of the algorithms developed for DOP (and adapted here to Tree-DOT) when implementing the (more complex) LFG-DOP and LFG-DOT models. We examine how constraints are employed in these models for more accurate disambiguation and seek an alternative methodology for improved constraint specification. We also hypothesise as to how the constraints used to predict both good parses and good translations might be pruned in a motivated fashion. Finally, we explore the relationship between translational equivalence and limited generalisation reusability for both the tree-based and LFG-based DOT models, focussing on how this relationship differs depending on which formalism is assumed.
机译:将基于规则的方法和数据驱动方法的MT的积极元素相结合的优点很明显:组合的模型具有高度准确,强大,成本有效的构建和适应能力。尽管优点显而易见,但是如何最好地将这些技术组合成一个模型,该模型既保留了每种方法的积极特征,又继承了尽可能少的缺点,这仍然是一个未解决的问题。解决这一挑战的一种可能的解决方案是Poutsma(1998,2000,2003)最初提出的面向数据的翻译(DOT)模型,该模型基于面向数据的解析(DOP)(例如(Bod,1992; Bod et al。 ,2003)),并结合示例,语言信息和统计翻译模型。 ud ud在本文中,我们试图确定DOT翻译模型如何与当前使用的其他主要MT方法相关。我们发现此模型与MT的其他混合模型不同之处在于,它在一个集成的框架中紧密地交织了基于规则,基于示例和统计方法的哲学。 ud ud尽管DOT在理论上体现出许多积极的特征,它还继承了与DOP相关的计算复杂性。以前的评估DOT翻译模型性能的实验规模很小,所使用的训练数据并不理想地适合该任务(Poutsma,2000,2003)。但是,用于执行这些实验的DOT实施的算法局限性阻止了进行更有价值的评估。在本文中,我们着眼于为解决实现DOP模型的挑战而开发的创新解决方案,并研究了它们在DOT中的应用。这项调查最终导致了DOT系统的开发。该系统使我们能够进行规模更大的翻译实验,并且比以前具有更大的翻译复杂性。我们的评估表明,当对模型进行实证评估时,从理论上确定的模型的积极特征也很明显。例如,就精确匹配准确性而言,DOT模型比对相同数据进行训练和测试的SMT模型的性能要高89.73%。 ud ud我们为其提供实证评估的DOP和DOT模型假定上下文无关的短语结构树表示形式。但是,也可以为更复杂的语言形式主义开发此类模型。在本文中,我们还着重研究了将词汇功能语法(LFG)表示与DOP和DOT集成在一起的努力。当实施(更复杂的)LFG-DOP和LFG-DOT模型时,我们研究了为DOP开发的算法(在这里适用于Tree-DOT)的有用性。我们研究了如何在这些模型中采用约束条件以更准确地消除歧义,并寻求一种用于改进约束条件规格的替代方法。我们还假设可能会以有动机的方式修剪用于预测良好解析和良好翻译的约束。最后,我们探讨了基于树的和基于LFG的DOT模型的翻译对等性与有限的泛化可重用性之间的关系,着眼于这种关系如何不同(取决于假设的形式主义)。

著录项

  • 作者

    Hearne Mary;

  • 作者单位
  • 年度 2005
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号