首页> 外文会议>International Conference on Natural Language Processing and Knowledge Engineering >Enhancement of unsupervised feature selection for conditional random fields learning in Chinese word segmentation
【24h】

Enhancement of unsupervised feature selection for conditional random fields learning in Chinese word segmentation

机译:汉字分割中有条件随机字段学习的无监督特征选择的增强

获取原文

摘要

This work proposed a unified view of several unsupervised feature selection based on frequent strings that improve conditional random fields (CRF) model for Chinese word segmentation (CWS). These features include character-based n-gram (CNG), accessor variety based string (AVS), term-contributed frequency (TCF), and term-contributed boundary (TCB), with a specific manner of boundary overlapping. For the experiment, the baseline is the 6-tag, a state-of-the-art labeling scheme of CRF-based CWS; and the data set is acquired from SIGHAN CWS bakeoff 2005 and SIGHAN CWS 2010. The experiment results show that all of those features improve the performance of the baseline system in terms of recall, precision, and their harmonic average as F1 measure score, on both accuracy (F) and out-of-vocabulary recognition (FOOV). In particular, this work presents a novel feature selection approach of the compound feature “AVS+TCB” that outperforms other types of features for CRF-based CSW in terms of F and FOOV.
机译:这项工作提出了一种基于频繁字符串的多个无监督特征选择的统一视图,该频繁字符串改善了汉字分段(CWS)的条件随机字段(CRF)模型。这些特征包括基于字符的N-GRAM(CNG),访问器多种基于串(AVS),术语贡献的频率(TCF)和术语贡献的边界(TCB),具有边界重叠的特定方式。对于实验,基线是6标签,基于CRF的CWS的最先进标记方案;并且数据集是从Sighan CWS BAKEOFF 2005和Sighan CWS 2010获取的。实验结果表明,所有这些功能都可以提高基线系统的性能,以召回,精确,以及它们的谐波平均值为F 1 < / INF>测量分数,精度(F)和词汇识别(F OOV )。特别是,该工作提出了一种新颖的特征选择方法,该特征选择方法“AVS + TCB”优于基于CRF的CSW的其他类型的特征,而F和F OOV

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号