Efficient classification across multiple database relations: a CrossMine approach

Yin X.; Han J.; Yang J.; Yu P.S.

首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >Efficient classification across multiple database relations: a CrossMine approach

【24h】

Efficient classification across multiple database relations: a CrossMine approach

机译：跨多个数据库关系的有效分类：一种CrossMine方法

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Relational databases are the most popular repository for structured data, and is thus one of the richest sources of knowledge in the world. In a relational database, multiple relations are linked together via entity-relationship links. Multirelational classification is the procedure of building a classifier based on information stored in multiple relations and making predictions with it. Existing approaches of inductive logic programming (recently, also known as relational mining) have proven effective with high accuracy in multirelational classification. Unfortunately, most of them suffer from scalability problems with regard to the number of relations in databases. In this paper, we propose a new approach, called CrossMine, which includes a set of novel and powerful methods for multirelational classification, including 1) tuple ID propagation, an efficient and flexible method for virtually joining relations, which enables convenient search among different relations, 2) new definitions for predicates and decision-tree nodes, which involve aggregated information to provide essential statistics for classification, and 3) a selective sampling method for improving scalability with regard to the number of tuples. Based on these techniques, we propose two scalable and accurate methods for multirelational classification: CrossMine-Rule, a rule-based method and CrossMine-Tree, a decision-tree-based method. Our comprehensive experiments on both real and synthetic data sets demonstrate the high scalability and accuracy of the CrossMine approach.

机译：关系数据库是最流行的结构化数据存储库，因此是世界上最丰富的知识来源之一。在关系数据库中，多个关系通过实体关系链接链接在一起。多重关系分类是基于存储在多个关系中的信息构建分类器并对其进行预测的过程。归纳逻辑编程的现有方法（最近也称为关系挖掘）已被证明在多关系分类中非常有效。不幸的是，它们中的大多数都存在有关数据库中关系数量的可伸缩性问题。在本文中，我们提出了一种名为CrossMine的新方法，该方法包括一套用于多关系分类的新颖而强大的方法，其中包括：1）元组ID传播，一种有效且灵活的虚拟连接关系方法，可以方便地在不同关系之间进行搜索，2）谓词和决策树节点的新定义，其中涉及聚合信息以提供用于分类的基本统计信息，以及3）选择性抽样方法，用于提高元组数量的可伸缩性。基于这些技术，我们提出了两种可伸缩且准确的多关系分类方法：基于规则的方法CrossMine-Rule和基于决策树的CrossMine-Tree。我们对真实和综合数据集进行的全面实验证明了CrossMine方法的高度可扩展性和准确性。

著录项

来源
《IEEE Transactions on Knowledge and Data Engineering》 |2006年第6期|p.770-783|共14页
作者
Yin X.; Han J.; Yang J.; Yu P.S.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
data mining; decision trees; entity-relationship modelling; inductive logic programming; pattern classification; relational databases; sampling methods; CrossMine-Rule; CrossMine-Tree; decision-tree nodes; entity-relationship links; inductive logic programming; mult;

机译：数据挖掘;决策树;实体关系建模;归纳逻辑编程;模式分类;关系数据库;采样方法;CrossMine-Rule;CrossMine-Tree;决策树节点;实体关系链接;归纳逻辑编程;mult;

相似文献

外文文献
中文文献
专利

1. Efficient Heterogeneous Multi-relational Classification Using Multi-criteria Ranking Approach Based on Characteristics of Multiple Relations [J] . Amit R. Thakkar1, Yogeshwar P. Kosta2 Journal of Computers . 2015,第6期

机译：基于多种关系特征的多标准排名方法有效的异构多关系分类
2. An efficient approach for mining sequential patterns using multiple threads on very large databases [J] . Bao Huynh, Cuong Trinh, Huy Huynh, Engineering Applications of Artificial Intelligence . 2018,第SEPa期

机译：在大型数据库上使用多个线程挖掘顺序模式的有效方法
3. Hybrid Framework Using Multiple-Filters and an Embedded Approach for an Efficient Selection and Classification of Microarray Data [J] . Bonilla-Huerta Edmundo, Hernandez-Montiel Alberto, Caporal Roberto-Morales, Computational Biology and Bioinformatics, IEEE/ACM Transactions on . 2016,第1期

机译：使用多个过滤器和嵌入式方法的混合框架，用于有效选择和分类微阵列数据
4. CrossMine: Efficient Classification Across Multiple Database Relations [C] . Xiaoxin Yin, Jiawei Han, Jiong Yang, European Workshop on Inductive Databases and Constraint Based Mining . 2006

机译：横梁：跨多个数据库关系有效分类
5. Scalable mining and link analysis across multiple database relations [D] . Yin, Xiaoxin 2007

机译：跨多个数据库关系的可伸缩挖掘和链接分析
6. An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases [O] . Md. Rezaul Karim, Md. Mamunur Rashid, Byeong-Soo Jeong, 2012

机译：从大型DNA序列数据库中挖掘最大连续频率模式的有效方法
7. Efficient Classification across Multiple Database Relations: A Crossmine Approach [O] . Xiaoxin Yin, Jiawei Han, Senior Member, 2006

机译：跨多个数据库关系的有效分类：一种跨越方法
8. Toward Efficient Quality of Information Estimation in Simultaneous Acoustic Tracking and Classification of Multiple Targets [R] . Damarla, T., Thornley, D. J., Gillies, D. F., 2009

机译：在同步声学跟踪和多目标分类中实现高效的信息质量评估

Efficient classification across multiple database relations: a CrossMine approach

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅