首页> 外文会议>Latin American Web Conference >A Genetic Niching Algorithm with Self-Adaptating Operator Rates for Document Clustering
【24h】

A Genetic Niching Algorithm with Self-Adaptating Operator Rates for Document Clustering

机译:一种具有自适应操作员群体的遗传占算法,用于文档聚类

获取原文

摘要

We propose a Genetic algorithm for document clustering, where an evolutionary multimodal optimization algorithm evolves candidate cluster representative solutions to search for dense regions in the sparse high dimensional vector space of text documents. The evolution affects not only the document cluster representatives but also the genetic operator rates which are evolved simultaneously with the document cluster representative solutions. The evolving population consists of candidate document cluster representatives that are encoded in the form of a sparse index and sparse index/frequency variable length vectors. In addition, specialized sparse genetic operators are defined for this special representation. The proposed specialized genetic operators achieve different degrees of exploitation and exploration in searching for the optimal document cluster prototypes, in particular the most specialized operator for the document clustering problem is the Sparse Top-K-Addition operator, which can be seen as an incentive towards a more aggressive exploitation of the local context in a small subset of documents, whereas the simple Sparse Real Addition operator works more in an exploratory manner. As shown in our experiments on two well-known document data sets, taking into account associated terms within a local context adds the benefit of an explicit latent semantic consideration in the search for optimal term lists to describe the cluster prototypes.
机译:我们提出了一种用于文档聚类的遗传算法,其中进化多式化优化算法演变了候选集群代表性解决方案,以搜索文本文档的稀疏高维矢量空间中的密集区域。该进化不仅影响文件集群代表,而且影响了与文件集群代表解决方案同时演变的遗传运营商率。不断发展的人口由候选文档集群代表组成,这些群集代表以稀疏索引和稀疏索引/频率可变长度向量的形式编码。此外,专门的稀疏遗传算子是为此特殊代表定义的。拟议的专业遗传运营商在寻找最佳文档群集原型方面取得了不同程度的开发和探索,特别是最专业的文档聚类问题的操作员是稀疏的Top-k-Afterperator,可以被视为激励在一小部分文件中更积极地利用本地背景,而简单的稀疏实际加法运算符以探索性方式工作。如我们在两个众所周知的文档数据集的实验中所示,考虑到本地上下文中的关联术语增加了在搜索最佳术语列表中的显式潜在语义考虑的益处,以描述群集原型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号