Research on the categorization accuracy of different similarity measures on Chinese texts

机译：中文文本中不同相似度度量的分类准确性研究

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper works on the most intensively studied algorithm- k Nearest Neighbor algorithm. The purpose is to investigate the performance of different similarity measures in the kNN on Chinese texts. The two measures that we focus on are cosine value and Jensen-Shannon Divergence. We use both the corpus collected from the Sogou, whose data extracts from the website of Sohu.com, and datasets that we have processed from real word. The results of our experiment indicate that difference of similarity metrics significantly affects the categorization accuracy.

机译：本文研究最深入研究的算法-k最近邻算法。目的是研究中文文本中kNN中不同相似性度量的性能。我们关注的两个度量是余弦值和詹森-香农散度。我们既使用从搜狗收集的语料库（从Sohu.com网站提取数据），也使用我们从实词处理过的数据集。我们的实验结果表明，相似性指标的差异会显着影响分类准确性。

著录项

来源
《2011 International Conference on Business Management and Electronic Information》|2011年|p.224-227|共4页
会议地点
作者
Li Xiangdong; Liu Hangyu; Jia Han; Huang Li;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类企业现代化管理;
关键词
Chinese text categorization; KNN algorithm; Similarity; Sougou Corpus;

机译：中文文本分类; KNN算法;相似度;搜狗语料库;
入库时间 2022-08-26 14:24:34

相似文献

外文文献
中文文献
专利

1. Using chi-square statistics to measure similarities for text categorization [J] . Yao-Tsung Chen, Meng Chang Chen Expert Systems with Application . 2011,第4期

机译：使用卡方统计量度文本分类的相似性
2. Similarity Measures for Chinese Short Text Based on Representation Learning [J] . Yan Li, Xucheng Yin, Yinghua Zhang, Journal of information and computational science . 2015,第6期

机译：基于表征学习的中文短文本相似性度量
3. A Repetition Based Measure for Verification of Text Collections and for Text Categorization [J] . Dmitry V. Khmelev, William J. Teahan ACM SIGIR FORUM . 2003,第Special期

机译：用于验证文本集合和文本分类的基于重复的度量
4. Research on the categorization accuracy of different similarity measures on Chinese texts [C] . Li Xiangdong, Liu Hangyu, Jia Han, International Conference on Business Management and Electronic Information . 2011

机译：不同相似措施对中文文本的分类准确性研究
5. An Automatic Similarity Detection Engine Between Sacred Texts Using Text Mining and Similarity Measures [D] . Qahl, Salha Hassan Muhammed. 2014

机译：使用文本挖掘和相似度度量的神圣文本之间的自动相似度检测引擎
6. Use of a support vector machine for categorizing free-text notes: assessment of accuracy across two institutions [O] . Adam Wright, Allison B McCoy, Stanislav Henkin, 2013

机译：使用支持向量机对自由文本注释进行分类：评估两个机构的准确性
7. Text categorization and similarity analysis: similarity measure, literature review [O] . Fowke Michael, Hinze Annika, Heese Ralf 2013

机译：文本分类和相似性分析：相似性度量，文献综述

Research on the categorization accuracy of different similarity measures on Chinese texts

摘要

著录项

相似文献

相关主题

期刊订阅