首页> 外文会议>2011 International Conference on Business Management and Electronic Information >Research on the categorization accuracy of different similarity measures on Chinese texts
【24h】

Research on the categorization accuracy of different similarity measures on Chinese texts

机译:中文文本中不同相似度度量的分类准确性研究

获取原文

摘要

This paper works on the most intensively studied algorithm- k Nearest Neighbor algorithm. The purpose is to investigate the performance of different similarity measures in the kNN on Chinese texts. The two measures that we focus on are cosine value and Jensen-Shannon Divergence. We use both the corpus collected from the Sogou, whose data extracts from the website of Sohu.com, and datasets that we have processed from real word. The results of our experiment indicate that difference of similarity metrics significantly affects the categorization accuracy.
机译:本文研究最深入研究的算法-k最近邻算法。目的是研究中文文本中kNN中不同相似性度量的性能。我们关注的两个度量是余弦值和詹森-香农散度。我们既使用从搜狗收集的语料库(从Sohu.com网站提取数据),也使用我们从实词处理过的数据集。我们的实验结果表明,相似性指标的差异会显着影响分类准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号