首页> 外文会议>Proceedings of the 2007 International Conference on Artificial Intelligence(ICAI'2007) >SM based Operation for Specializing a Fast Clustering Algorithm for Text Clustering
【24h】

SM based Operation for Specializing a Fast Clustering Algorithm for Text Clustering

机译:用于文本聚类的快速聚类算法的基于SM的操作

获取原文

摘要

This research proposes a new strategy where documents are encoded into string vectors for text clustering and modified versions of single pass algorithms to be adaptable to string vectors. Traditionally, when the single pass algorithm is used for pattern clustering, raw data should be encoded into numerical vectors. This encoding may be difficult, depending on a given application area of pattern clustering. For example, in text clustering, encoding full texts given as raw data into numerical vectors leads to two main problems: huge dimensionality and sparse distribution. In order to address the two problems, in this research, we encode full texts into string vectors, and apply single pass algorithm to string vectors for text clustering.
机译:这项研究提出了一种新的策略,其中将文档编码为字符串向量以进行文本聚类,并修改单次通过算法的版本以适应字符串向量。传统上,当将单遍算法用于模式聚类时,应将原始数据编码为数值向量。取决于模式聚类的给定应用领域,这种编码可能很困难。例如,在文本聚类中,将作为原始数据给出的全文编码为数值向量会导致两个主要问题:巨大的维数和稀疏的分布。为了解决这两个问题,在本研究中,我们将全文编码为字符串向量,并将单遍算法应用于字符串向量以进行文本聚类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号