【24h】

String Vector based KNN for text categorization

机译:基于字符串向量的KNN用于文本分类

获取原文
获取原文并翻译 | 示例

摘要

This research proposes the string vector based version of the KNN as the approach to the text categorization. Traditionally, texts should be encoded into numerical vectors for using the traditional version of KNN, and encoding so leads to the three main problems: huge dimensionality, sparse distribution, and poor transparency. In order to solve the problems, in this research, texts are encoded into string vectors, instead of numerical vectors, the similarity measure between string vectors is defined, and the KNN is modified into the version where string vector is given its input. As the benefits from this research, we may expect the better performance, more compact representation of each text, and better transparency. The goal of this research is to improve the text categorization performance by solving them.
机译:这项研究提出了一种基于字符串向量的KNN版本作为文本分类的方法。传统上,应使用传统版本的KNN将文本编码为数值向量,并且编码会导致三个主要问题:尺寸大,分布稀疏和透明度差。为了解决这些问题,在本研究中,将文本编码为字符串向量,而不是数字向量,定义了字符串向量之间的相似性度量,并将KNN修改为输入字符串向量的版本。由于这项研究的好处,我们可以期望得到更好的性能,每个文本的更紧凑的表示以及更好的透明度。本研究的目的是通过解决文本分类问题来提高文本分类性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号