首页> 外文会议>Mexican International Conference on Artificial Intelligence >Algorithm for Clustering of Web Search Results from a Hyper-heuristic Approach
【24h】

Algorithm for Clustering of Web Search Results from a Hyper-heuristic Approach

机译:来自超起启发式方法的网络搜索结果群集算法

获取原文

摘要

The clustering of web search results - or web document clustering (WDC) - has become a very interesting research area among academic and scientific communities involved in information retrieval. Systems for the clustering of web search results, also called Web Clustering Engines, seek to increase the coverage of documents presented for the user to review, while reducing the time spent reviewing them. Several algorithms for clustering of web results already exist, but results show there is room for more to be done. This paper introduces a hyper-heuristic framework called WDC-HH, which allows the defining of the best algorithm for WDC. The hyper-heuristic framework uses four high-level-heuristics (performance-based rank selection, tabu selection, random selection and performance-based roulette wheel selection) for selecting low-level heuristics (used to solve the specific problem of WDC). As a low level heuristics the framework considers: harmony search, improved harmony search, novel global harmony search, global-best harmony search, eighteen genetic algorithm variations, particle swarm optimization, artificial bee colony, and differential evolution. The framework uses the k-means algorithm as a local solution improvement strategy and based on the Balanced Bayesian Information Criterion it is able to automatically define the appropriate number of clusters. The framework also uses four acceptance/replacement strategies (replacement heuristics): Replace the worst, Restricted Competition Replacement, Stochastic Replacement and Rank Replacement. WDC-HH was tested with four data sets using a total of 447 queries with their ideal solutions. As a main result of the framework assessment, a new algorithm based on global-best harmony search and rank replacement strategy obtained the best results in WDC problem. This new algorithm was called WDC-HH-BHRK and was also compared against other established WDC algorithms, among them: Suffix Tree Clustering (STC) and Lingo. Results show a considerable improvement-measured by recall, F-measure, fall-out, accuracy and SSL_k-over the other algorithms.
机译:网络搜索结果 - 或Web文档聚类(WDC)的聚类 - 已成为参与信息检索的学术和科学社区之间的非常有趣的研究区。用于群集Web搜索结果的系统,也称为Web群集引擎,寻求增加为用户提供的文档的覆盖范围,同时减少了审查它们的时间。用于群集Web结果的几种算法已经存在,但结果显示有更多待完成的空间。本文介绍了一个称为WDC-HH的超启发式框架,允许定义WDC的最佳算法。超高启发式框架使用四个高级启发式(基于性能的等级选择,禁忌选择,基于随机选择和基于性能的轮盘轮选择),用于选择低级别启发式(用于解决WDC的特定问题)。框架考虑了低级启发式:和谐搜索,改进的和声搜索,新的全球和声搜索,全球最佳的和声搜索,十八次遗传算法变化,粒子群优化,人造蜂殖民地和差分演变。该框架使用K-Means算法作为本地解决方案改进策略,并基于平衡贝叶斯信息标准,它能够自动定义适当数量的群集。该框架还使用四种验收/更换策略(更换启发式):更换最坏的,限制的竞争更换,随机替代和秩更换。 WDC-HH使用四个数据集进行了测试,使用总共447个查询具有理想的解决方案。作为框架评估的主要结果,基于全球最佳和声搜索和等级更换策略的新算法获得了WDC问题的最佳结果。这种新的算法称为WDC-HH-BHRK,并与其他已建立的WDC算法进行比较,其中:后缀树聚类(STC)和Lingo。结果显示了通过召回,F测量,降低,精度和SSL_K-over over其他算法来实现相当大的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号