【24h】

Combining Classifiers to Identify Online Databases

机译:组合分类器以识别在线数据库

获取原文
获取原文并翻译 | 示例

摘要

We address the problem of identifying the domain of online databases. More precisely, given a set F of Web forms automatically gathered by a focused crawler and an online database domain D, our goal is to select from F only the forms that are entry points to databases in D. Having a set of Web forms that serve as entry points to similar online databases is a requirement for many applications and techniques that aim to extract and integrate hidden-Web information, such as meta-searchers. online database directories, hidden-Web crawlers, and form-schema matching and merging.rnWe propose a new strategy that automatically and accurately classifies online databases based on features that can be easily extracted from Web forms. By judiciously partitioning the space of form features, this strategy allows the use of simpler classifiers that can be construct ed using learning techniques that are better suited for the features of each partition. Experiments using real Web data in a representative set of domains show that the use of different classifiers leads to high accuracy, precision and recall. This indicates that our modular classifier composition provides an effective and scalable solution for classifying online databases.
机译:我们解决了识别在线数据库域的问题。更准确地说,给定一组F的Web表单是由专注的爬虫和在线数据库域D自动收集的,我们的目标是从F中仅选择作为D中数据库入口点的表单。具有一组可服务的Web表单作为类似在线数据库的切入点,对于许多旨在提取和集成隐藏Web信息的应用程序和技术(例如元搜索器)都是必需的。在线数据库目录,隐藏的Web爬网程序以及表单模式匹配和合并。我们提出了一种新策略,该策略基于可以轻松地从Web表单中提取的功能来自动,准确地对在线数据库进行分类。通过明智地划分表单特征的空间,此策略允许使用更简单的分类器,这些分类器可以使用更适合每个分区的特征的学习技术来构造。在一组具有代表性的域中使用真实Web数据进行的实验表明,使用不同的分类器可以提高准确性,准确性和查全率。这表明我们的模块化分类器组合为在线数据库分类提供了有效且可扩展的解决方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号