首页> 中文期刊>计算机工程与应用 >哈萨克语动词短语自动识别研究与实现

哈萨克语动词短语自动识别研究与实现

     

摘要

由于哈萨克语基本动词短语KzBaseVP的组成结构比较复杂,并且存在歧义情况和训练语料规模不够大等问题,所以既不能直接使用基于规则的方法,又不能直接使用基于统计的方法来进行处理。所以提出了一种规则与最大熵相结合的方法对哈萨克语基本动词短语(KzBaseVP)进行识别。在该混合策略系统中,根据专属KzBaseVP的特点构建了KzBaseVP搭配规则集,通过规则集对无歧义的KzBaseVP进行标注,其正确率为85.43%;运用基于统计的最大熵模型对存在歧义的KzBaseVP进行识别,根据哈萨克语的单词、词性、词缀和上下文信息等来设计最大熵模型的特征模板,并对模型进行了改进,在解码中选取概率最大的前n个上下文信息分别加入到下一个VP的特征向量中,以此类推直至文本结束,最终选出一条概率最优的VP标注。实验证明,在封闭和开发测试条件下对基本动词短语的识别准确率分别为97.23%和93.22%。%The method based on rules can’t be used to process because the structure of KzBaseVP is complex and ambi-guity is common, so this paper puts forward a combined method of rules and statistics to recognize Kazakh Base Verb Phrase(KzBaseVP). In this mixed strategy system, the set of KzBaseVP match rules has been established according to the features of exclusive KzBaseVP, the unambiguous KzBaseVP has been tagged through a set of rules, the correct rate is 85.43%. It uses the maximum entropy model to identify ambiguities KzBaseVP based on statistics, designs maximum entropy model feature template according to the Kazakh word, POS, affix and context information, the model has been improved, the first N maximum probability of context information has been added to the feature vectors of next VP and so on until the end of the text, an optimal probability VP tagging has been selected in the end. Experimental results show that the close test and open test of average accuracy of identifying Kazakh Base VP is 97.23%and 93.22%.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号