首页> 中文期刊>计算机科学 >基于统计学习框架的中文新词检测方法

基于统计学习框架的中文新词检测方法

     

摘要

Automatic detection of new words is an important foundation in Chinese information processing,but Chinese has an extremely strong word-building ability,which brings great difficulties for new Chinese word detection. This paper put forward a formal model for new word detection, through which the relations between features and detection effects can be constructed. On this basis,this paper also proposed to employ high-effective statistical learning model as a framework to integrate diffirent kinds of available features, which can make full use of the combination of features to further improve the effects of new word detection. Experiments show that the performance of statistical framework is much better than that of simple sum of single features and the method of this paper can effectively improve the result of new word detection. F value in open and closed experiment is 49. 72% and 69. 83% respectively,which reaches a better level among current studies.%新词自动检测是中文信息处理的重要基础,但中文字符极强的构词能力给新词检测带来了巨大困难.提出一种新词检测的形式化描述模型,用以建立特征和新词检测结果之间的统计联系.在此基础上提出应用统计学习模型作为框架来整合不同类型的可用特征,以充分发挥特征之间的组合作用,进一步改善新词检测效果.实验表明,统计框架方法的性能明显地优于特征的简单叠加,能有效提高新词检测效果,开放实验和封闭实验的F值分别为49.72%和69.83%,达到了目前的较好水平.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号