首页> 外文期刊>Journal of information and computational science >Recognition of Out-of-vocabulary Words Using Stopwords
【24h】

Recognition of Out-of-vocabulary Words Using Stopwords

机译:使用停用词识别词汇外词

获取原文
获取原文并翻译 | 示例
           

摘要

Main impact in Chinese word segmentation is from disambiguation and recognition of out-of-vocabulary (OOV) word. Studies show that the accuracy in word segmentation caused by OOV words is reduced more seriously than that by ambiguous words. Therefore, Chinese word segmentation would be improved largely and effectively by the correct recognition of OOV words. Though the recognition of OOV words is impacted by stopwords, it is conductive to improve the quality and the efficiency in the segmentation process with a proper use. Motivated by this, a recognition algorithm of OOV word based on stopwords, named ROWS, is proposed in this paper. It achieves the prominent improvement of the recognition of OOV words. Extensive studies demonstrate that in comparison with the state-of-the-art segmentation method of ICTCLAS, ROWS outperforms on the recognition of OOV words, and the values of Recall and Precision are improved by 5.65and 4.8% respectively on average.
机译:汉语分词的主要影响来自歧义和词汇外(OOV)词的识别。研究表明,与歧义词相比,OOV词引起的分词准确性下降得更为严重。因此,通过正确识别OOV词,可以大大有效地改善中文分词。尽管对OOV单词的识别会受到停用词的影响,但在适当使用的情况下,它有助于提高分割过程的质量和效率。为此,本文提出了一种基于停用词的OOV词识别算法,即ROWS。它实现了对OOV单词识别的显着改进。大量研究表明,与最新的ICTCLAS分割方法相比,ROWS在OOV单词识别方面的表现要好,Recall和Precision的值分别平均提高了5.65和4.8%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号