【24h】

Error-Driven Pruning of Treebank Grammars for Base Noun Phrase Identification

机译:误差驱动的树库语法修剪,用于基本名词短语识别

获取原文

摘要

Finding simple, non-recursive, base noun phrases is an important subtask for many natural language processing applications. While previous empirical methods for base NP identification have been rather complex, this paper instead proposes a very simple algorithm that is tailored to the relative simplicity of the task. In particular, we present a corpus-based approach for finding base NPs by matching part-ofspeech tag sequences. The training phase of the algorithm is based on two successful techniques: first the base NP grammar is read from a "treebank" corpus; then the grammar is improved by selecting rules with high "benefit" scores. Using this simple algorithm with a naive heouristic for matching rules, we achieve surprising accuracy in an evaluation on the Penn Treebank Wall Street Journal.
机译:对于许多自然语言处理应用程序而言,找到简单的,非递归的基础名词短语是重要的子任务。尽管先前的用于基础NP识别的经验方法相当复杂,但本文提出了一种非常简单的算法,该算法针对任务的相对简单性而量身定制。特别是,我们提出了一种基于语料库的方法,通过匹配词性标记序列来查找基础NP。该算法的训练阶段基于两种成功的技术:首先从“树库”语料库中读取基本的NP语法;第二,从“树库”语料库中读取基本的NP语法。然后通过选择具有较高“受益”分数的规则来改进语法。使用这种简单的算法和朴素的启发式算法来匹配规则,我们在《宾夕法尼亚州树银行华尔街日报》的评估中获得了令人惊讶的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号