首页> 外文会议>Mexican International Conference on Artificial Intelligence >Algorithm for Clustering of Web Search Results from a Hyper-heuristic Approach
【24h】

Algorithm for Clustering of Web Search Results from a Hyper-heuristic Approach

机译:基于超启发式方法的Web搜索结果聚类算法

获取原文

摘要

The clustering of web search results - or web document clustering (WDC) - has become a very interesting research area among academic and scientific communities involved in information retrieval. Systems for the clustering of web search results, also called Web Clustering Engines, seek to increase the coverage of documents presented for the user to review, while reducing the time spent reviewing them. Several algorithms for clustering of web results already exist, but results show there is room for more to be done. This paper introduces a hyper-heuristic framework called WDC-HH, which allows the defining of the best algorithm for WDC. The hyper-heuristic framework uses four high-level-heuristics (performance-based rank selection, tabu selection, random selection and performance-based roulette wheel selection) for selecting low-level heuristics (used to solve the specific problem of WDC). As a low level heuristics the framework considers: harmony search, improved harmony search, novel global harmony search, global-best harmony search, eighteen genetic algorithm variations, particle swarm optimization, artificial bee colony, and differential evolution. The framework uses the k-means algorithm as a local solution improvement strategy and based on the Balanced Bayesian Information Criterion it is able to automatically define the appropriate number of clusters. The framework also uses four acceptance/replacement strategies (replacement heuristics): Replace the worst, Restricted Competition Replacement, Stochastic Replacement and Rank Replacement. WDC-HH was tested with four data sets using a total of 447 queries with their ideal solutions. As a main result of the framework assessment, a new algorithm based on global-best harmony search and rank replacement strategy obtained the best results in WDC problem. This new algorithm was called WDC-HH-BHRK and was also compared against other established WDC algorithms, among them: Suffix Tree Clustering (STC) and Lingo. Results show a considerable improvement -measured by recall, F-measure, fall-out, accuracy and SSL_k- over the other algorithms.
机译:Web搜索结果的群集(或Web文档群集(WDC))已成为参与信息检索的学术界和科学界非常感兴趣的研究领域。用于对Web搜索结果进行聚类的系统(也称为Web聚类引擎)试图增加呈现给用户以供用户查看的文档的覆盖面,同时减少用于审阅它们的时间。已经有几种用于对Web结果进行聚类的算法,但是结果表明还有更多的工作要做。本文介绍了一种称为WDC-HH的超启发式框架,该框架允许定义WDC的最佳算法。超启发式框架使用四种高级启发式(基于性能的等级选择,禁忌选择,随机选择和基于性能的轮盘赌选择)来选择低级启发式(用于解决WDC的特定问题)。作为低级启发式方法,框架考虑:和声搜索,改进的和声搜索,新颖的全局和声搜索,全局最佳和声搜索,十八种遗传算法变异,粒子群优化,人工蜂群和差异进化。该框架使用k-means算法作为局部解决方案改进策略,并且基于“平衡贝叶斯信息准则”,它能够自动定义适当数量的集群。该框架还使用了四种接受/替换策略(替换启发法):替换最差的,受限的竞争替换,随机替换和等级替换。使用理想的解决方案,对WDC-HH的四个数据集进行了总共447个查询的测试。作为框架评估的主要结果,一种基于全局最佳和声搜索和等级替换策略的新算法在WDC问题中获得了最佳结果。这种新算法称为WDC-HH-BHRK,并且还与其他已建立的WDC算法进行了比较,其中包括:后缀树聚类(STC)和Lingo。结果显示,与其他算法相比,通过召回率,F度量,落差,准确性和SSL_k进行了测量,有了很大的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号