首页> 外文会议>2012 Eighth Latin American Web Congress. >A Genetic Niching Algorithm with Self-Adaptating Operator Rates for Document Clustering
【24h】

A Genetic Niching Algorithm with Self-Adaptating Operator Rates for Document Clustering

机译:具有自适应算子速率的遗传小生境文档聚类算法

获取原文
获取原文并翻译 | 示例

摘要

We propose a Genetic algorithm for document clustering, where an evolutionary multimodal optimization algorithm evolves candidate cluster representative solutions to search for dense regions in the sparse high dimensional vector space of text documents. The evolution affects not only the document cluster representatives but also the genetic operator rates which are evolved simultaneously with the document cluster representative solutions. The evolving population consists of candidate document cluster representatives that are encoded in the form of a sparse index and sparse index/frequency variable length vectors. In addition, specialized sparse genetic operators are defined for this special representation. The proposed specialized genetic operators achieve different degrees of exploitation and exploration in searching for the optimal document cluster prototypes, in particular the most specialized operator for the document clustering problem is the Sparse Top-K-Addition operator, which can be seen as an incentive towards a more aggressive exploitation of the local context in a small subset of documents, whereas the simple Sparse Real Addition operator works more in an exploratory manner. As shown in our experiments on two well-known document data sets, taking into account associated terms within a local context adds the benefit of an explicit latent semantic consideration in the search for optimal term lists to describe the cluster prototypes.
机译:我们提出了一种用于文档聚类的遗传算法,其中进化多模态优化算法演化了候选聚类代表解以在文本文档的稀疏高维向量空间中搜索密集区域。演化不仅影响文档簇代表,而且影响与文档簇代表解决方案同时演化的遗传算子速率。不断发展的总体由候选文档聚类代表组成,这些候选聚类代表以稀疏索引和稀疏索引/频率可变长度向量的形式进行编码。此外,为此特殊表示定义了专门的稀疏遗传运算符。拟议的专业遗传算子在寻找最佳文档簇原型时实现了不同程度的开发和探索,尤其是针对文档聚类问题的最专业算子是稀疏Top-K加法算子,可以看作是对在较小的文档子集中更积极地利用本地上下文,而简单的稀疏实加法运算符以探索性方式工作。如我们在两个众所周知的文档数据集上的实验所示,在本地上下文中考虑相关术语会在寻找最佳术语列表以描述聚类原型时增加显式潜在语义考虑的好处。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号