Web Snippet Clustering Based on Text Enrichment with Concept Hierarchy

机译：基于概念层次的文本丰富化Web片段聚类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clustering web snippet results returned from search engine helps facilitate browsing and navigating for users. Due to the extremely short length of web snippets, many traditional clustering techniques which adopt the bag of words model often yields unsatisfactory clustering results. In this paper, we propose a method of text enrichment for improving performance of web snippet clustering. The main idea is to expand the original snippets with some related conceptual terms. We apply the Open Directory Project (ODP), a web taxonomy organized by humans, to provide the concept hierarchy of the web contents. Using a test data set of 240 queries, we performed the experiments by using two clustering techniques: K-means clustering as the non-overlapping approach and the Suffix Tree Clustering (STC) as the overlapping approach. Using the proposed text enrichment method, the K-means clustering yielded the overall performance improvement up to 15.51% based on the Fl measure. On the other hand, the Suffix Tree Clustering with text enrichment helped improve the performance up to 53.71%.

机译：将搜索引擎返回的网页摘要结果进行聚类有助于帮助用户进行浏览和导航。由于网页摘要的长度非常短，因此许多采用词袋模型的传统聚类技术经常会产生不令人满意的聚类结果。在本文中，我们提出了一种文本丰富的方法，以提高Web代码段聚类的性能。主要思想是使用一些相关的概念术语来扩展原始代码片段。我们应用开放目录项目（ODP）（一种由人类组织的网络分类法）来提供Web内容的概念层次结构。使用240个查询的测试数据集，我们使用两种聚类技术进行了实验：K均值聚类作为非重叠方法，后缀树聚类（STC）作为重叠方法。使用建议的文本丰富方法，基于Fl度量，K均值聚类使整体性能提高了15.51％。另一方面，具有文本丰富功能的后缀树聚类帮助将性能提高了53.71％。

著录项

来源
《International conference on neural information processing;ICONIP 2009》|2009年|P.309-317|共9页
会议地点
作者
Supakpong Jinarat; Choochart Haruechaiyasak; Arnon Rungsawang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类信息处理（信息加工）;
关键词
web snippet clustering; concept hierarchy; text clustering; k-means clustering; suffix tree clustering; text enrichment;

机译：网络摘要集群;概念层次;文本聚类; k均值聚类后缀树聚类;文字充实;

相似文献

外文文献
中文文献
专利

1. A personalized search engine based on Web-snippet hierarchical clustering [J] . P. Ferragina, A. Gulli Software . 2008,第2期

机译：基于Web片段层次聚类的个性化搜索引擎
2. A Non-Redundant Hierarchical Web Snippet Clustering System to Enhance WWW Search [J] . SHIHCHIEH CHOU, CHJENCHENG SUN, SZUJUI HUANG WSEAS Transactions on Information Science and Applications . 2007,第2期

机译：增强WWW搜索的非冗余分层Web代码段聚类系统
3. Building a web-snippet clustering system based on a mixed clustering method [J] . Lin-Chih Chen On-line review . 2011,第4期

机译：建立基于混合聚类方法的网页摘要聚类系统
4. Web Snippet Clustering Based on Text Enrichment with Concept Hierarchy [C] . Supakpong Jinarat, Choochart Haruechaiyasak, Arnon Rungsawang International Confernece on Neural Information Processing . 2009

机译：基于文本富集的Web代码群集群与概念层次结构
5. Hierarchical conceptual clustering using a graph-based knowledge discovery system. [D] . Jonyer, Istvan. 2000

机译：使用基于图的知识发现系统进行层次概念聚类。
6. Efficacy of a web- and text messaging-based intervention to reduce problem drinking in young people: study protocol of a cluster-randomised controlled trial [O] . Severin Haug, Tobias Kowatsch, Raquel Paz Castro, 2014

机译：基于网络和文本消息的干预措施减少年轻人饮酒的功效：一项集群随机对照试验的研究方案
7. A personalized search engine based on web-snippet hierarchical clustering [O] . FERRAGINA P, A. GULLI 2008

机译：基于网页摘要分层聚类的个性化搜索引擎

Web Snippet Clustering Based on Text Enrichment with Concept Hierarchy

摘要

著录项

相似文献

相关主题

期刊订阅