首页> 中文期刊> 《中国电子杂志(英文版) 》 >Automatic Microblog-Oriented Unknown Word Recognition with Unsupervised Method

Automatic Microblog-Oriented Unknown Word Recognition with Unsupervised Method

         

摘要

As a prerequisite task in Natural language processing(NLP), Chinese word segmentation(CWS), is challenged by unknown words. Aiming to effectively detect Chinese unknown words, especially the low-frequency unknown words in unstructured microblog data, we modify the usage of Accessor variety(AV) to measure the context environments of core fragments and propose a novel variable, the Independence of strings, which is derived from the internal structure of segments. Our approach is unsupervised without using any manual materials. Due to the lack of manual resources of microblog-oriented unknown words extraction, we use sampling approach to assess the effectiveness of our method. Experimental results suggest our best system beats the baseline system as well as the state-of-the-art system by a significant improvement in F1-measure and the recall of low-frequency unknown words.

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号