首页> 中文期刊> 《计算机工程与设计》 >基于用字共现频率统计的外国译名自动识别

基于用字共现频率统计的外国译名自动识别

         

摘要

为了减少分词的负面效果,提出了基于用字共现频率统计的外国译名自动识别方法.对译名的用字特征进行了统计,提出译名共现字串的概念,并由译名用字表与汉语常用字表得到了非译名用字表.在上述工作的基础上定义了译名的边界,在边界定义的基础上设计了一种对分词错误的调整方法.对开放语料的测试结果表明,与最大词频分词算法相比,该算法在译名识别中的准确率、召回率、F值均有所提高.%To reduce the negative impact of segmentation, an automatic recognition algorithm for transliterated name recogni-tion based on co-occurrence frequency statistics of words is presented. Firstly, the statistical features of word of transliterated name are summarized and then the concept of co-occurrence string is proposed. The character table of non-translated name is obtained through the character table of transliterated name and the common Chinese character table. Secondly, the boundary of transliterated name is defined based on these above. Finally, an adjustment method is designed to deal with errors of segmenta-tion based on the definition of boundary. The result of experiment is satisfied in comparison with maximum word frequency seg-mentation algorithm. The recall rate, precision rate and F values of identification are enhanced.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号