首页> 外文学位 >Parallel algorithms for a highly unstructured problem: Natural language parsing using tree adjoining grammar.
【24h】

Parallel algorithms for a highly unstructured problem: Natural language parsing using tree adjoining grammar.

机译:针对高度非结构化问题的并行算法:使用树邻接语法的自然语言解析。

获取原文
获取原文并翻译 | 示例

摘要

Unstructured problems exhibit highly unpredictable computation and communication patterns when executed on a parallel multicomputer. As a result, such problems are very difficult to parallelize. In this dissertation, we explore a paradigmatic example of such an unstructured problem: parsing natural language using Tree Adjoining Grammar (TAG).; Parsing is foundational to any computational natural language processing system. Tree Adjoining Grammar provides a powerful grammatical formalism that allows many linguistic constructs to be expressed elegantly and succinctly. However, algorithms for TAG parsing incur considerably greater computational complexity than algorithms for parsing other natural language formalisms (such as context free grammar). Thus, TAG parsing is not only an example of a highly unstructured problem but also a problem whose effective parallelization promises to advance the state of the art in natural language processing by reducing the run time of TAG parsing to a practical duration.; In this dissertation, we explore a variety of new parallel algorithms for TAG parsing and their implementations on distributed-memory, shared-memory, and shared-address-space parallel multicomputers. In each case, our implementations exhibit the highest performance of any TAG parsers known. The P scARALLEL CT scAG parser is the fastest, full-function TAG parser, faster than the best parser offering similar functionality by a factor of over 200. We present each algorithm and its implementation iteratively, demonstrating the efficacy of various enhancements in the implementations that overcome sources of parallel overhead and improve performance.; As a paradigmatic instance of an unstructured problem, the general techniques and principles exposed in the process of parallelizing TAG parsing should also be applicable to the parallelization of other unstructured problems. We thus present not only an extensive examination of the experimental performance of various techniques for enhancing the performance of parallel algorithms for an unstructured problem, but we generalize these techniques and capture them in succinct, widely applicable design heuristics that can be applied to the parallelization of other such unstructured problems.
机译:当在并行多计算机上执行时,非结构化问题会表现出高度不可预测的计算和通信模式。结果,这些问题很难并行化。在本文中,我们探讨了这种非结构化问题的范例:使用树形连接语法(TAG)解析自然语言。解析是任何计算自然语言处理系统的基础。 “树邻接语法”提供了强大的语法形式主义,使许多语言结构可以优雅而简洁地表达。但是,用于TAG解析的算法比用于解析其他自然语言形式主义(例如,上下文无关语法)的算法具有更高的计算复杂度。因此,TAG解析不仅是高度非结构化问题的一个例子,而且是其有效并行化有望通过将TAG解析的运行时间减少到实际持续时间来提高自然语言处理技术水平的问题。在本文中,我们探索了多种新的用于TAG解析的并行算法及其在分布式内存,共享内存和共享地址空间并行多计算机上的实现。在每种情况下,我们的实现都表现出已知的所有TAG解析器中最高的性能。 P scARALLEL CT scAG解析器是最快的全功能TAG解析器,比提供相似功能的最佳解析器要快200倍以上。我们迭代地介绍了每种算法及其实现,展示了实现中各种增强功能的功效。克服并行开销的来源并提高性能。作为非结构化问题的一个范式实例,在并行化TAG解析过程中公开的一般技术和原理也应适用于其他非结构化问题的并行化。因此,我们不仅对各种技术的实验性能进行了广泛的研究,以增强针对非结构化问题的并行算法的性能,而且我们对这些技术进行了概括,并以简洁,可广泛应用的设计试探法将其捕获,这些试探法可以应用于并行处理。其他这样的非结构化问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号