首页> 外文期刊>Information Processing & Management >Using Web structure and summarisation techniques for Web content mining
【24h】

Using Web structure and summarisation techniques for Web content mining

机译:使用Web结构和摘要技术进行Web内容挖掘

获取原文
获取原文并翻译 | 示例

摘要

The dynamic nature and size of the Internet can result in difficulty finding relevant information. Most users typically express their information need via short queries to search engines and they often have to physically sift through the search results based on relevance ranking set by the search engines, making the process of relevance judgement time-consuming. In this paper, we describe a novel representation technique which makes use of the Web structure together with summarisation techniques to better represent knowledge in actual Web Documents. We named the proposed technique as Semantic Virtual Document (SVD). We will discuss how the proposed SVD can be used together with a suitable clustering algorithm to achieve an automatic content-based categorization of similar Web Documents. The auto-categorization facility as well as a "Tree-like" Graphical User Interface (GUI) for post-retrieval document browsing enhances the relevance judgement process for Internet users. Furthermore, we will introduce how our cluster-biased automatic query expansion technique can be used to overcome the ambiguity of short queries typically given by users. We will outline our experimental design to evaluate the effectiveness of the proposed SVD for representation and present a prototype called iSEARCH (Intelligent SEarch And Review of Cluster Hierarchy) for Web content mining. Our results confirm, quantify and extend previous research using Web structure and summarisation techniques, introducing novel techniques for knowledge representation to enhance Web content mining.
机译:Internet的动态性质和规模可能导致难以找到相关信息。大多数用户通常通过对搜索引擎的简短查询来表达他们的信息需求,并且他们通常不得不根据搜索引擎设置的相关性排名从物理上筛选搜索结果,从而使相关性判断过程非常耗时。在本文中,我们描述了一种新颖的表示技术,该技术利用Web结构以及摘要技术来更好地表示实际Web文档中的知识。我们将提出的技术命名为语义虚拟文档(SVD)。我们将讨论如何将提出的SVD与合适的聚类算法一起使用,以实现对相似Web文档的基于内容的自动分类。自动分类功能以及用于检索后文档浏览的“树状”图形用户界面(GUI)增强了Internet用户的相关性判断过程。此外,我们将介绍如何使用我们的集群偏向自动查询扩展技术来克服用户通常给出的短查询的歧义。我们将概述我们的实验设计,以评估提出的SVD表示的有效性,并提出用于Web内容挖掘的名为iSEARCH(集群结构层次结构的智能搜索和审查)的原型。我们的研究结果证实,量化和扩展了以前使用Web结构和摘要技术进行的研究,并引入了新颖的知识表示技术来增强Web内容挖掘。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号