首页> 外文期刊>Journal of Language Modelling >How to embed noncrossing trees in Universal Dependencies treebanks in a low-complexity regular language
【24h】

How to embed noncrossing trees in Universal Dependencies treebanks in a low-complexity regular language

机译:如何以低复杂度的常规语言在通用依赖树库中嵌入非交叉树

获取原文
       

摘要

A recently proposed balanced-bracket encoding (Yli-Jyr? and GómezRodríguez 2017) has given us a way to embed all noncrossing dependency graphs into the string space and to formulate their exact arcfactored inference problem (Kuhlmann and Johnsson 2015) as the best string problem in a dynamically constructed and weighted unambiguous context-free grammar. The current work improves the encoding and makes it shallower by omitting redundant brackets from it. The streamlined encoding gives rise to a bounded-depth subset approximation that is represented by a small finite-state automaton. When bounded to 7 levels of balanced brackets, the automaton has 762 states and represents a strict superset of more than 99.9999% of the noncrossing trees available in Universal Dependencies 2.4 (Nivre et al. 2019). In addition, it strictly contains all 15-vertex noncrossing digraphs. When bounded to 4 levels and 90 states, the automaton still captures 99.2% of all noncrossing trees in the reference dataset. The approach is flexible and extensible towards unrestricted graphs, and it suggests tight finite-state bounds for dependency parsing, and for the main existing parsing methods.
机译:最近提出的平衡括号编码(Yli-Jyr?和GómezRodríguez2017)为我们提供了一种方法,可以将所有非交叉依赖图嵌入到字符串空间中,并将其精确的反弧推理问题(Kuhlmann和Johnsson 2015)公式化为最佳字符串问题在动态构造和加权的无上下文无关语法中。当前的工作通过省略多余的括号来改善编码并使其更浅。简化的编码产生了一个有限深度子集的近似值,该近似值由一个小的有限状态自动机表示。当绑定到7个级别的平衡括号时,自动机具有762个状态,并代表通用依存关系2.4(Nivre et al.2019)中超过99.9999%的非交叉树的严格超集。此外,它严格包含所有15个顶点非交叉有向图。当绑定到4个级别和90个状态时,自动机仍会捕获参考数据集中所有99.2%的非交叉树。该方法对于无限制的图是灵活的和可扩展的,并且它为依赖项解析和现有的主要解析方法建议了严格的有限状态范围。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号