首页> 外文会议>Asia information retrieval societies conference >An MDL-Based Frequent Itemset Hierarchical Clustering Technique to Improve Query Search Results of an Individual Search Engine

【24h】

An MDL-Based Frequent Itemset Hierarchical Clustering Technique to Improve Query Search Results of an Individual Search Engine

机译：基于MDL的频繁项集层次聚类技术可改善单个搜索引擎的查询搜索结果

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this research we propose a technique of frequent itemset hierarchical clustering (FIHC) using an MDL-based algorithm, viz KRIMP. Different from the FIHC technique, in this proposed method we define clustering as a rank sequence problem of the top-3 ranked list of each itemsets-of-keywords clusters in web documents search results of a given query to a search engine. The key idea of an MDL compression based approach is the code table. Only frequent and representative keywords as those in a KRIMP code table can be used as candidates, instead of using all important keywords from keywords extractor such as RAKE. To simulate information needs in the real world, the web documents are originated from the search results of a multi domain query. By starting in a meta-search engine environment to grab many relevant documents, we set up k = {50, 100, 200} for k-toplist retrieved documents of each search engine to build a dataset for automatic relevance judgement. We implement a clustering technique to the best individual search engine the MDL-based FIHC algorithm with setting of k = {50, 100, 200} for k-toplist of retrieved documents of each search engine, minimum support = 5 for itemset KRIMP compression, and minimum cluster support = 0.1 for FIHC clustering. Our results show that the MDL-based FIHC clustering can improve the relevance scores of web search results on an individual search engine significantly (until 39.2 % at precision P@10, k-toplist = 50).

机译：在这项研究中，我们提出了一种使用基于MDL的算法（即KRIMP）的频繁项集层次聚类（FIHC）的技术。与FIHC技术不同，在此提出的方法中，我们将聚类定义为Web文档在给定搜索引擎的搜索结果中每个关键词集的前3个排名列表的排名序列问题。基于MDL压缩的方法的关键思想是代码表。只能使用KRIMP代码表中的频繁且具有代表性的关键字作为候选项，而不是使用关键字提取器（例如RAKE）中的所有重要关键字。为了模拟现实世界中的信息需求，Web文档源自多域查询的搜索结果。通过在元搜索引擎环境中开始以获取许多相关文档，我们为每个搜索引擎的k个顶级检索文档设置了k = {50，100，200}，以建立用于自动相关性判断的数据集。我们为基于MDL的FIHC算法向最佳的个人搜索引擎实施了一种聚类技术，其中每个搜索引擎的k-toplist设置k = {50，100，200}，对于项目集KRIMP压缩，最小支持= 5， FIHC群集的最小群集支持= 0.1。我们的结果表明，基于MDL的FIHC聚类可以显着提高单个搜索引擎上的Web搜索结果的相关性得分（精度为P @ 10时为39.2％，k-toplist = 50）。

著录项

来源
《Asia information retrieval societies conference》|2015年|279-291|共13页
会议地点
作者
Diyah Puspitaningrum; Fauzi; Boko Susilo; Jeri Apriansyah Pagua; Aan Erlansari; Desi Andreswari; Rusdi Efendi; I.S.W.B. Prasetya;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
MDL-based FIHC; Frequent itemset hierarchical clustering; KRIMP; Search engine; Relevance score;

机译：基于MDL的FIHC;频繁项集层次聚类; KRIMP;搜索引擎;关联分数;

相似文献

外文文献
中文文献
专利

1. 基于极大熵OWA算子的元搜索引擎搜索结果集成 [J] . 桑秀芝, 刘新旺东南大学学报（英文版） . 2013,第002期
2. Labeling of Web Search Result Clusters Using Heuristic Search and Frequent Itemset [J] . Mansaf Alam, Kishwar Sadaf Procedia Computer Science . 2015,第1期

机译：使用启发式搜索和频繁项集标记Web搜索结果集群
3. Labeling of Web Search Result Clusters Using Heuristic Search and Frequent Itemset [J] . Mansaf Alam, Kishwar Sadaf Procedia Computer Science . 2015,第1期

机译：使用启发式搜索和频繁项集标记Web搜索结果集群
4. A method for improving graph queries processing using positional inverted index (P.I.I) idea in search engines and parallelization techniques [J] . Hamed Dinari, Hassan Naderi 中南大学学报（英文版） . 2016,第001期

机译：一种利用搜索引擎中的位置倒排索引（P.I.I）思想和并行化技术改进图形查询处理的方法
5. An MDL-Based Frequent Itemset Hierarchical Clustering Technique to Improve Query Search Results of an Individual Search Engine [C] . Diyah Puspitaningrum, Fauzi, Boko Susilo, Asia Information Retrieval Societies Conference . 2015

机译：基于MDL的频繁项目集分层聚类技术，用于改进单个搜索引擎的查询搜索结果
6. Improving Mobile Web Search by Clustering and Visualizing Search Engine Results. [D] . Alasmari, Ashwag. 2015

机译：通过对搜索引擎结果进行聚类和可视化来改善移动Web搜索。
7. NWB Query Engines: Tools to Search Data Stored in Neurodata Without Borders Format [O] . Petr Ježek, Jeffery L. Teeters, Friedrich T. Sommer 2020

机译：NWB查询引擎：用于搜索存储在NeuroData中的数据的工具无边界格式
8. Labeling of Web Search Result Clusters Using Heuristic Search and Frequent Itemset [O] . Alam Mansaf, Sadaf Kishwar 2015

机译：使用启发式搜索和频繁项集标记Web搜索结果集群
9. Frequent Itemset Mining for Query Expansion in Microblog Ad-hoc Search. [R] . Aboulnaga, Y., Clarke, C. L. 2012

机译：微博ad-hoc搜索中用于查询扩展的频繁项集挖掘。

An MDL-Based Frequent Itemset Hierarchical Clustering Technique to Improve Query Search Results of an Individual Search Engine

摘要

著录项

相似文献

相关主题

期刊订阅