首页> 外文会议>International conference on neural information processing;ICONIP 2009 >Web Snippet Clustering Based on Text Enrichment with Concept Hierarchy
【24h】

Web Snippet Clustering Based on Text Enrichment with Concept Hierarchy

机译:基于概念层次的文本丰富化Web片段聚类

获取原文

摘要

Clustering web snippet results returned from search engine helps facilitate browsing and navigating for users. Due to the extremely short length of web snippets, many traditional clustering techniques which adopt the bag of words model often yields unsatisfactory clustering results. In this paper, we propose a method of text enrichment for improving performance of web snippet clustering. The main idea is to expand the original snippets with some related conceptual terms. We apply the Open Directory Project (ODP), a web taxonomy organized by humans, to provide the concept hierarchy of the web contents. Using a test data set of 240 queries, we performed the experiments by using two clustering techniques: K-means clustering as the non-overlapping approach and the Suffix Tree Clustering (STC) as the overlapping approach. Using the proposed text enrichment method, the K-means clustering yielded the overall performance improvement up to 15.51% based on the Fl measure. On the other hand, the Suffix Tree Clustering with text enrichment helped improve the performance up to 53.71%.
机译:将搜索引擎返回的网页摘要结果进行聚类有助于帮助用户进行浏览和导航。由于网页摘要的长度非常短,因此许多采用词袋模型的传统聚类技术经常会产生不令人满意的聚类结果。在本文中,我们提出了一种文本丰富的方法,以提高Web代码段聚类的性能。主要思想是使用一些相关的概念术语来扩展原始代码片段。我们应用开放目录项目(ODP)(一种由人类组织的网络分类法)来提供Web内容的概念层次结构。使用240个查询的测试数据集,我们使用两种聚类技术进行了实验:K均值聚类作为非重叠方法,后缀树聚类(STC)作为重叠方法。使用建议的文本丰富方法,基于Fl度量,K均值聚类使整体性能提高了15.51%。另一方面,具有文本丰富功能的后缀树聚类帮助将性能提高了53.71%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号