【24h】

TCtract-A Collocation Extraction Approach for Noun Phrases Using Shallow Parsing Rules and Statistic Models

机译:TCtract-A浅析浅析浅析规则和统计模型的名词短语搭配提取方法

获取原文
获取原文并翻译 | 示例

摘要

This paper presents a hybrid method for extracting Chinese noun phrase collocations that combines a statistical model with rule-based linguistic knowledge. The algorithm first extracts all the noun phrase collocations from a shallow parsed corpus by using syntactic knowledge in the form of phrase rules. It then removes pseudo collocations by using a set of statistic-based association measures (AMs) as filters. There are two main purposes for the design of this hybrid algorithm: (1) to maintain a reasonable recall while improving the precision, and (2) to investigate the proposed association measures on Chinese noun phrase collocations. The performance is compared with a pure statistical model and a pure rule-based method on a 60MB PoS tagged corpus. The experiment results show that the proposed hybrid method has a higher precision of 92.65% and recall of 47% based on 29 randomly selected noun headwords compared with the precision of 78.87% and recall of 27.19% of a statistics based extraction system. The F-score improvement is 55.7%.
机译:本文提出了一种混合的提取汉语名词短语搭配的方法,该方法将统计模型与基于规则的语言知识相结合。该算法首先通过使用短语规则形式的句法知识从浅层分析的语料库中提取所有名词短语搭配。然后,通过使用一组基于统计的关联度量(AM)作为过滤器来删除伪搭配。该混合算法的设计有两个主要目的:(1)在提高精度的同时保持合理的回忆,(2)研究提出的汉语名词短语搭配的关联度量。将性能与60MB PoS标记语料库上的纯统计模型和基于纯规则的方法进行比较。实验结果表明,与基于统计的提取系统的78.87%和27.19%的查全率相比,基于29种随机选择的名词headwords的混合方法具有92.65%的准确率和47%的查全率。 F得分提高了55.7%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

联系方式:18141920177 (微信同号)

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号