首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >Efficient classification across multiple database relations: a CrossMine approach
【24h】

Efficient classification across multiple database relations: a CrossMine approach

机译:跨多个数据库关系的有效分类:一种CrossMine方法

获取原文
获取原文并翻译 | 示例

摘要

Relational databases are the most popular repository for structured data, and is thus one of the richest sources of knowledge in the world. In a relational database, multiple relations are linked together via entity-relationship links. Multirelational classification is the procedure of building a classifier based on information stored in multiple relations and making predictions with it. Existing approaches of inductive logic programming (recently, also known as relational mining) have proven effective with high accuracy in multirelational classification. Unfortunately, most of them suffer from scalability problems with regard to the number of relations in databases. In this paper, we propose a new approach, called CrossMine, which includes a set of novel and powerful methods for multirelational classification, including 1) tuple ID propagation, an efficient and flexible method for virtually joining relations, which enables convenient search among different relations, 2) new definitions for predicates and decision-tree nodes, which involve aggregated information to provide essential statistics for classification, and 3) a selective sampling method for improving scalability with regard to the number of tuples. Based on these techniques, we propose two scalable and accurate methods for multirelational classification: CrossMine-Rule, a rule-based method and CrossMine-Tree, a decision-tree-based method. Our comprehensive experiments on both real and synthetic data sets demonstrate the high scalability and accuracy of the CrossMine approach.
机译:关系数据库是最流行的结构化数据存储库,因此是世界上最丰富的知识来源之一。在关系数据库中,多个关系通过实体关系链接链接在一起。多重关系分类是基于存储在多个关系中的信息构建分类器并对其进行预测的过程。归纳逻辑编程的现有方法(最近也称为关系挖掘)已被证明在多关系分类中非常有效。不幸的是,它们中的大多数都存在有关数据库中关系数量的可伸缩性问题。在本文中,我们提出了一种名为CrossMine的新方法,该方法包括一套用于多关系分类的新颖而强大的方法,其中包括:1)元组ID传播,一种有效且灵活的虚拟连接关系方法,可以方便地在不同关系之间进行搜索,2)谓词和决策树节点的新定义,其中涉及聚合信息以提供用于分类的基本统计信息,以及3)选择性抽样方法,用于提高元组数量的可伸缩性。基于这些技术,我们提出了两种可伸缩且准确的多关系分类方法:基于规则的方法CrossMine-Rule和基于决策树的CrossMine-Tree。我们对真实和综合数据集进行的全面实验证明了CrossMine方法的高度可扩展性和准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号