【24h】

When Rules Meet Bigrams

机译:当规则符合Bigrams时

获取原文

摘要

This paper discusses an on-going project aiming at improving the quality and the efficiency of a rule-based parser by the addition of a statistical component. The proposed technique relies on bigrams of pairs (word+category) selected from the homographs contained in our lexical database and computed over a large section of the Hansard corpus, previously tagged. The bigram table is used by the parser to rank and prune the set of alternatives. To evaluate the gains obtained by the hybrid system, we conducted two manual evaluations. One over a small subset of the Hansard corpus, the other one with a corpus of about 50 articles taken from the magazine The Economist. In both cases, we compare analyses obtained by the parser with and without the statistical component, focusing only on one important source of mistakes, the confusion between nominal and verbal readings for ambiguous words such as announce, sets, costs, labour, etc.
机译:本文讨论了一个正在进行的项目,旨在通过添加统计组分来提高基于规则的解析器的质量和效率。 所提出的技术依赖于从我们的词汇数据库中包含的同类标签中选择的Bigram(Word +类别),并在以前标记的Hansard语料库的大部分上计算。 解析器使用Bigram表来排名和修剪替代品集。 为了评估混合动力系统获得的收益,我们进行了两种手动评估。 一个在汉族语料库的一个小块子集中,另一个有一个关于经济学家杂志的大约50篇文章的语料库。 在这两种情况下,我们比较解析器的分析与统计组件没有统计组成部分,只关注一个重要的错误来源,宣布,套装,成本,劳动力等含糊不清词的名义和口头读数之间的混乱。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号