...
首页> 外文期刊>Engineering Economics >A novel clustering algorithm for large-scale text collection and its incremental version
【24h】

A novel clustering algorithm for large-scale text collection and its incremental version

机译:一种用于大规模文本收集的新型聚类算法及其增量版本

获取原文
           

摘要

Nowadays, the fast advance of internet technology has brought two challenges. The first one is explosion of information. The second one is new information appears rapidly. Obviously, clustering is a good solution to help users analyze information automatically, whereas traditional clustering algorithms are only suitable for small-scale and stable text collection. In order to solve this problem, a novel clustering algorithm based on vector compression particularly for large-scale text collection (LDVC) and its incremental version (I-LDVC) are proposed in this paper. LDVC selects related features to compress feature sets. Iterative training idea of self- organizing-mapping (SOM) is also imported in it to optimize selection approach. Besides, when novel texts appear, its incremental version (I-LDVC) can select small samples from original texts to alter neuron model to perform incremental clustering. In order to prevent it from over fitting to new added texts, I-LDVC adjusts the weights of samples along with training process. Experimental results demonstrate that LDVC has better performance and lower time complexity on large-scale text collection, and I-LDVC can cluster unstable text collection very well.DOI: http://dx.doi.org/10.5755/j01.itc.45.2.8666
机译:如今,互联网技术的飞速发展带来了两个挑战。第一个是信息爆炸。第二个是新信息迅速出现。显然,聚类是帮助用户自动分析信息的良好解决方案,而传统的聚类算法仅适用于小规模且稳定的文本收集。为了解决这个问题,本文提出了一种基于矢量压缩的新型聚类算法,特别是针对大规模文本收集(LDVC)及其增量版本(I-LDVC)。 LDVC选择相关功能来压缩功能集。自组织映射(SOM)的迭代训练思想也被引入其中以优化选择方法。此外,当出现新颖的文本时,其增量版本(I-LDVC)可以从原始文本中选择小样本来更改神经元模型以执行增量聚类。为了防止它过度适合新添加的文本,I-LDVC会在训练过程中调整样本的权重。实验结果表明,LDVC在大规模文本收集上具有更好的性能和较低的时间复杂度,并且I-LDVC可以很好地对不稳定的文本收集进行聚类.DOI:http://dx.doi.org/10.5755/j01.itc.45.2 .8666

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号