首页> 中文期刊>大众科技 >基于专业术语提取的中文分词方法

基于专业术语提取的中文分词方法

     

摘要

According to some unknown words,such as related professional term which have some forms in science and technology literature,it is hard to distinguish and influence the Chinese word segmentation accuracy,this is a Chinese word segmentation method based on professional term extraction.Through a large number of specific areas of professional corpus,based on mutual information and statistics method,to get unknown words such as professional term,make a professional term dictionary and combined with general word dictionary,use positive maximal matching algorithm for the Chinese word segmentation.Proved by some experiments,this word segmentation method can accurately get professional term and improve accuracy of segmentation which has high practical application value.%针对在科技文献中,未登录词等相关专业术语其变化多端,在中文分词中难以识别,影响了专业领域文章的分词准确度,结合实际情况给出了一种基于专业术语提取的中文分词方法。通过大量特定领域的专业语料库,基于互信息和统计的方法,对文中的未登录词等专业术语进行提取,构造专业术语词典,并结合通用词词典,利用最大匹配方法进行中文分词。经实验证明,该分词方法可以较准确的抽取出相关专业术语,从而提高分词的精度,具有实际的应用价值。

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号