首页> 外文会议>Document Analysis Systems (DAS), 2012 10th IAPR International Workshop on >Impact of Word Segmentation Errors on Automatic Chinese Text Classification
【24h】

Impact of Word Segmentation Errors on Automatic Chinese Text Classification

机译:分词错误对中文文本自动分类的影响

获取原文
获取原文并翻译 | 示例

摘要

In this paper, several sets of experiments were carried out to study the impact of word segmentation errors on automatic Chinese text classification. Comparison experiment of four word-based approaches was first carried out and the results show that the performance was significantly reduced when using automatic word segmentation instead of manual word segmentation which means errors caused by automatic word segmentation have an obvious impact on classification performance. We further conducted the experiment using character-based approach (N-gram). Although N-gram approach produces a large number of ambiguous words, the results show that it performed better than automatic word segmentation.
机译:本文通过几组实验研究了分词错误对中文文本自动分类的影响。首先进行了四种基于单词的方法的比较实验,结果表明,使用自动分词而不是手动分词会显着降低性能,这意味着由自动分词引起的错误对分类性能有明显的影响。我们进一步使用基于字符的方法(N-gram)进行了实验。尽管N-gram方法会产生大量的歧义词,但结果表明它比自动分词要好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号