首页> 中文期刊> 《计算机技术与发展》 >基于统计的中文地名自动识别研究

基于统计的中文地名自动识别研究

         

摘要

中文地名的自动识别是命名实体识别任务中难度较大的任务之一,目的是从中文文本中自动准确提取地理专用名词.文中使用统计模型中的条件随机场对中文地名的自动识别在字一级粒度进行了研究.在研究中利用条件随机场能任意添加特征的优点,合理引用了丰富的特征组合,在大规模语料上进行训练,统计获得标注序列基于特征集的条件概率分布,并采用序列标注的方式,实现中文地名的自动识别.多次闭合测试和开放测试结果F1值为90%左右,识别效果良好.%Chinese location name recognition is one of the difficult tasks of Chinese named entity recognition. Its task is automatic extracting geography special nouns from Chinese texts accurately. Based on one of the statistical models, the conditional random fields,discussed the task of automatic recognition of Chinese location name on the character level. To take advantage of the ability of using arbitrary features as input in CRFs, not only reasonable feature template was structured, but also the large scale corpus was used in training. The conditional probability distribution of label sequences was computed using statistics. By sequence labeling, it implemented the automatic recognition of Chinese location name. It obtained promising results on different closed and opened test corpus with the Fl measurement value of about 90% .

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号