...
首页> 外文期刊>Natural language engineering >How much can part-of-speech tagging help parsing?
【24h】

How much can part-of-speech tagging help parsing?

机译:词性标记可以帮助解析多少?

获取原文
获取原文并翻译 | 示例

摘要

Folk wisdom holds that incorporating a part-of-speech tagger into a system that performs deep linguistic analysis will improve the speed and accuracy of the system. Previous studies of tagging have tested this belief by incorporating an existing tagger into a parsing system and observing the effect on the speed of the parser and accuracy of the results. However, not much work has been done to determine in a fine-grained manner exactly how much tagging can help to disambiguate or reduce ambiguity in parser output. We take a new approach to this issue by examining the full parse-forest output of a large-scale LFG-based English grammar (Riezler et al. (2002)) running on the XLE grammar development platform (Maxwell and Kaplan (1993); Maxwell and Kaplan (1996)), and partitioning the parse outputs into equivalence classes based on the tag sequences for each parse. If we find a large number of tag-sequence equivalence classes for each sentence, we can conclude that different parses tend to be distinguished by their tags; a small number means that tagging would not help much in reducing ambiguity. In this way, we can determine how much tagging would help us in the best case, if we had the "perfect tagger" to give us the correct tag sequence for each sentence. We show that if a perfect tagger were available, a reduction in ambiguity of about 50% would be available. Somewhat surprisingly, about 30% of the sentences in the corpus that was examined would not be disambiguated, even by the perfect tagger, since all of the parses for these sentences shared the same tag sequence. Our study also helps to inform research on tagging by providing a targeted determination of exactly which tags can help the most in disambiguation.
机译:民间智慧认为,将词性标记器合并到执行深度语言分析的系统中将提高系统的速度和准确性。以前的标记研究已经通过将现有标记器合并到解析系统中并观察了对解析器速度和结果准确性的影响,从而检验了这一信念。但是,还没有做太多的工作来以细粒度的方式确切地确定多少标签可以帮助消除或减少解析器输出中的歧义。通过研究在XLE语法开发平台上运行的大型基于LFG的英语语法(Riezler等人,2002)的完整解析森林输出(Maxwell和Kaplan,1993),我们采取了一种新的方法。 (Maxwell and Kaplan(1996)),并根据每个解析的标记序列将解析输出分为等价类。如果我们为每个句子找到大量的标记序列等价类,我们可以得出结论,不同的语法分析倾向于用它们的标记来区分。数量少意味着加标签在减少歧义方面无济于事。这样,如果我们有“完美的标记器”为每个句子提供正确的标记序列,我们就可以确定在最佳情况下多少标记将对我们有帮助。我们表明,如果有一个完美的标记器,模糊度将减少约50%。出乎意料的是,被检查的语料库中大约30%的句子即使是完美的标记器也不会被消除歧义,因为这些句子的所有语法都共享相同的标记序列。我们的研究还通过准确确定哪些标签可以最大程度地帮助消除歧义,从而有助于为标签研究提供参考。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号