Clustering of Web Search Results Based on Combination of Links and In-Snippets

机译：基于链接和摘录组合的Web搜索结果聚类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Search engine is a common tool to retrieve the information in the Web. But the current status of returned results is still far from satisfaction. Users have to be confronted with searching for a long result list to get the information really wanted. Many works focused on the post processing search results to facilitate users to examine the results. One of the common ways of post processing search result is clustering. Term-based clustering appears as first way to cluster the results. But this method is suffering from the poor quality while the processed pages have little text. Link-based clustering can conquer this problem. But the quality of clusters heavily depends on the number of in-links and out-links in common. In this paper, we propose that the short text attached to in-link is valuable information and it is helpful to reach high clustering quality. To distinguish them with general snippet, we name it as in-snippet. Based on the in-snippet, we propose a new clustering method that combines the links and the in-snippets together. In our method, similarity between pages consists of two parts : link similarity and term similarity. We designed related algorithm to implement clustering. In order to prevent bias from human judgments, the experiment datasets are collected from Open Directory Project(DMOZ). Due to DMOZ is human-edited directory, the datasets from DMOZ has higher quality and larger scale. We use entropy and f-measure to evaluate the quality of the final clusters. By being compared with the link-based and the pure term-based algorithms, our method outperforms others in clustering quality.

机译：搜索引擎是在Web上检索信息的常用工具。但是，返回结果的当前状态仍然远远不能令人满意。用户必须面对搜索很长的结果列表才能获得真正想要的信息。许多作品专注于后处理搜索结果，以方便用户检查结果。后处理搜索结果的常见方法之一是聚类。基于术语的聚类是将结果聚类的第一种方法。但是，这种方法的质量很差，而处理过的页面几乎没有文字。基于链接的群集可以解决此问题。但是群集的质量在很大程度上取决于共同的入站和出站数量。在本文中，我们建议链接中附加的短文本是有价值的信息，有助于达到较高的聚类质量。为了将它们与一般代码段区分开，我们将其命名为代码段内。基于摘要，我们提出了一种将链接和摘要结合在一起的新聚类方法。在我们的方法中，页面之间的相似度由两部分组成：链接相似度和术语相似度。我们设计了相关的算法来实现聚类。为了防止人为判断产生偏差，从开放目录项目（DMOZ）收集了实验数据集。由于DMOZ是人工编辑的目录，因此DMOZ的数据集具有更高的质量和更大的规模。我们使用熵和f测度来评估最终聚类的质量。通过与基于链接的算法和基于纯术语的算法进行比较，我们的方法在聚类质量方面优于其他方法。

著录项

来源
《2011 Eighth Web Information Systems and Applications Conference》|2011年|p.108-113|共6页
会议地点
作者
Yang Nan; Liu Yue; Yang Gang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算机网络;
关键词
Clustering; Link analysis; Search engine result;

机译：聚类;链接分析;搜索引擎结果;

相似文献

外文文献
中文文献
专利

1. Web Search Result Clustering based on Cuckoo Search and Consensus Clustering [J] . Mansaf Alam, Kishwar Sadaf Indian Journal of Science and Technology . 2016,第15期

机译：基于布谷鸟搜索和共识聚类的Web搜索结果聚类
2. Clustering of web search results based on the cuckoo search algorithm and Balanced Bayesian Information Criterion [J] . Carlos Cobos, Henry Mu?oz-Collazos, Richar Urbano-Mu?oz, Information Sciences: An International Journal . 2014,第Null期

机译：基于布谷鸟搜索算法和平衡贝叶斯信息准则的网络搜索结果聚类
3. Multi Level Web Data Extraction Based Topical Visual Structure Clustering for Efficient Web Search [J] . Sureshkumar T, Shanthi N Journal of computational and theoretical nanoscience . 2017,第9期

机译：基于多级Web数据提取的高效网络搜索的局部视觉结构聚类
4. Clustering of Web Search Results Based on Combination of Links and In-Snippets [C] . Nan Yang, Yue Liu, Gang Yang 2011 Eighth Web Information Systems and Applications Conference . 2011

机译：基于链接和摘录组合的Web搜索结果聚类
5. Combination of multiple Web search results and its effect on the search performance. [D] . Dong, Jianhua. 2000

机译：多个Web搜索结果的组合及其对搜索性能的影响。
6. Enhancing web search result clustering model based on multiview multirepresentation consensus cluster ensemble (mmcc) approach [O] . Ali Sabah, Sabrina Tiun, Nor Samsiah Sani, 2021

机译：基于MultiView多重特派复断的共识群集（MMCC）方法增强基于MultiView Multimirepration的群集群集模型
7. Link Based Clustering of Web Search Results [O] . Yitong Wang And, Yitong Wang, Masaru Kitsuregawa 2001

机译：基于链接的Web搜索结果聚类
8. Web Page Clustering using Heuristic Search in the Web Graph [R] . Bekkerman, R. , Zilberstein, S. , Allan, J. 2006

机译：Web图中使用启发式搜索的网页聚类

Clustering of Web Search Results Based on Combination of Links and In-Snippets

摘要

著录项

相似文献

相关主题

期刊订阅