首页> 外文会议>International conference on web engineering >Mining Taxonomies from Web Menus: Rule-Based Concepts and Algorithms
【24h】

Mining Taxonomies from Web Menus: Rule-Based Concepts and Algorithms

机译:从Web菜单中挖掘分类法:基于规则的概念和算法

获取原文

摘要

The logical hierarchies of Web sites (i.e. Web site taxonomies) are obvious to humans, because humans can distinguish different menu levels and their relationships. But such accurate information about the logical structure is not yet available to machines. Many applications would benefit if Web site taxonomies could be mined from menus, but it was an almost unsolvable problem in the past. While a tag newly introduced in HTML5 and novel mining methods allow to distinguish menus from other contents today, it has not yet been researched, how the underlying taxonomies can be extracted, given the menus. In this paper we present the first detailed analysis of the problem and introduce rule-based concepts for addressing each identified sub problem. We report on a large-scale study on mining hierarchical menus of 350 randomly selected domains. Our methods allow extracting Web site taxonomy information that was not available before with high precision and high recall.
机译:网站的逻辑层次结构(即网站分类法)对于人类来说是显而易见的,因为人类可以区分不同的菜单级别及其关系。但是这样的关于逻辑结构的准确信息尚不能用于机器。如果可以从菜单中挖掘网站分类法,那么许多应用程序都将从中受益,但这在过去几乎是无法解决的问题。尽管HTML5中新引入的标签和新颖的挖掘方法如今可以将菜单与其他内容区分开,但尚未对其进行研究,即如何在给定菜单的情况下提取基本分类法。在本文中,我们对问题进行了首次详细分析,并介绍了基于规则的概念来解决每个已识别的子问题。我们报告了一项关于对350个随机选择的域的分层菜单进行挖掘的大规模研究的报告。我们的方法允许以高精度和高召回率提取以前无法获得的网站分类信息。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号