首页> 外文OA文献 >Integrating deep and shallow natural language processing components : representations and hybrid architectures
【2h】

Integrating deep and shallow natural language processing components : representations and hybrid architectures

机译:集成浅层和浅层自然语言处理组件:表示形式和混合体系结构

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

We describe basic concepts and software architectures for the integration of shallow and deep (linguistics-based, semantics-oriented) natural language processing (NLP) components. The main goal of this novel, hybrid integration paradigm is improving robustness of deep processing. After an introduction to constraint-based natural language parsing, we give an overview of typical shallow processing tasks. We introduce XML standoff markup as an additional abstraction layer that eases integration of NLP components, and propose the use of XSLT as a standardized and efficient transformation language for online NLP integration. In the main part of the thesis, we describe our contributions to three hybrid architecture frameworks that make use of these fundamentals. SProUT is a shallow system that uses elements of deep constraint-based processing, namely type hierarchy and typed feature structures. WHITEBOARD is the first hybrid architecture to integrate not only part-of-speech tagging, but also named entity recognition and topological parsing, with deep parsing. Finally, we present Heart of Gold, a middleware architecture that generalizes WHITEBOARD into various dimensions such as configurability, multilinguality and flexible processing strategies. We describe various applications that have been implemented using the hybrid frameworks such as structured named entity recognition, information extraction, creative document authoring support, deep question analysis, as well as evaluations. In WHITEBOARD, e.g., it could be shown that shallow pre-processing increases both coverage and efficiency of deep parsing by a factor of more than two. Heart of Gold not only forms the basis for applications that utilize semanticsoriented natural language analysis, but also constitutes a complex research instrument for experimenting with novel processing strategies combining deep and shallow methods, and eases replication and comparability of results.
机译:我们描述了浅层和深层(基于语言,面向语义)自然语言处理(NLP)组件集成的基本概念和软件体系结构。这种新颖的混合集成范例的主要目标是提高深度处理的鲁棒性。在介绍了基于约束的自然语言解析之后,我们概述了典型的浅层处理任务。我们引入XML隔离标记作为附加的抽象层,简化了NLP组件的集成,并建议使用XSLT作为在线NLP集成的标准化和高效转换语言。在论文的主要部分,我们描述了我们对利用这些基础知识的三个混合体系结构框架的贡献。 SProUT是一个浅层系统,它使用基于深度约束的深度处理元素,即类型层次结构和类型化特征结构。 WHITEBOARD是第一个混合语言体系结构,它不仅集成了词性标记,而且还命名实体识别和拓扑解析以及深度解析。最后,我们介绍了“黄金之心”,这是一种中间件体系结构,将WHITEBOARD概括为各个方面,例如可配置性,多语言和灵活的处理策略。我们描述了使用混合框架实现的各种应用程序,例如结构化的命名实体识别,信息提取,创意文档编写支持,深入的问题分析以及评估。例如,在WHITEBOARD中,可以表明,浅层预处理将深度解析的覆盖率和效率提高了两倍以上。 Gold of Heart不仅构成利用面向语义的自然语言分析的应用程序的基础,而且构成了一种复杂的研究工具,用于尝试结合深浅方法的新颖处理策略,并简化了结果的复制和可比性。

著录项

  • 作者

    Schäfer Ulrich;

  • 作者单位
  • 年度 2006
  • 总页数
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号