【24h】

String vector based AHC for text clustering

机译:基于字符串向量的AHC用于文本聚类

获取原文

摘要

In this research, we propose the string vector based version of AHC algorithm as the approach to the text clustering. Using the traditional version leads to the three main problems: huge dimensionality, sparse distribution, poor transparency, since texts need to be encoded into numerical vectors. In order to solve the problems, in this research, we encode texts into string vectors, define the similarity measure between them, and modify the AHC algorithm into the version where a string vector is given as its input. As the benefits from this research, we expect the better performance, the more compact representation, and the better transparency. Hence, this research is intended to improve the text clustering performance, by solving the problems.
机译:在这项研究中,我们提出了基于字符串向量的AHC算法版本作为文本聚类的方法。使用传统版本会导致三个主要问题:尺寸大,分布稀疏,透明度差,因为文本需要被编码为数值向量。为了解决这些问题,在本研究中,我们将文本编码为字符串向量,定义它们之间的相似性度量,然后将AHC算法修改为以字符串向量作为输入的版本。作为这项研究的收益,我们期望更好的性能,更紧凑的表示形式和更好的透明度。因此,本研究旨在通过解决这些问题来提高文本聚类性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号