首页> 外文会议>Proceedings of the 4th international workshop on Multi-relational mining >An efficient multi-relational Naive Bayesian classifier based on semantic relationship graph
【24h】

An efficient multi-relational Naive Bayesian classifier based on semantic relationship graph

机译:基于语义关系图的高效多关系朴素贝叶斯分类器

获取原文
获取原文并翻译 | 示例

摘要

Classification is one of the most popular data mining tasks with a wide range of applications, and lots of algorithms have been proposed to build accurate and scalable classifiers. Most of these algorithms only take a single table as input, whereas in the real world most data are stored in multiple tables and managed by relational database systems. As transferring data from multiple tables into a single one usually causes many problems, development of multi-relational classification algorithms becomes important and attracts many researchers' interests. Existing works about extending Naive Bayes to deal with multi-relational data either have to transform data stored in tables to main-memory Prolog facts, or limit the search space to only a small subset of real world applications. In this work, we aim at solving these problems and building an efficient, accurate Naive Bayesian classifier to deal with data in multiple tables directly. We propose an algorithm named Graph-NB, which upgrades Naive Bayesian classifier to deal with multiple tables directly. In order to take advantage of linkage relationships among tables, and treat different tables linked to the target table differently, a semantic relationship graph is developed to describe the relationship and to avoid unnecessary joins. Furthermore, to improve accuracy, a pruning strategy is given to simplify the graph to avoid examining too many weakly linked tables. Experimental study on both real-world and synthetic databases shows its high efficiency and good accuracy.
机译:分类是具有广泛应用程序的最流行的数据挖掘任务之一,并且已经提出了许多算法来构建准确且可扩展的分类器。这些算法中的大多数仅将单个表作为输入,而在现实世界中,大多数数据存储在多个表中并由关系数据库系统进行管理。由于将数据从多个表转移到一个表中通常会引起很多问题,因此开发多关系分类算法变得很重要,并且吸引了许多研究人员的兴趣。现有的有关扩展朴素贝叶斯以处理多关系数据的工作要么必须将存储在表中的数据转换为主要内存的Prolog事实,要么将搜索空间限制为现实应用程序的一小部分。在这项工作中,我们旨在解决这些问题,并建立有效,准确的朴素贝叶斯分类器来直接处理多个表中的数据。我们提出了一种名为 Graph-NB 的算法,该算法将朴素贝叶斯分类器升级为可以直接处理多个表。为了利用表之间的链接关系,并以不同的方式对待链接到目标表的不同表,开发了一种语义关系图来描述这种关系并避免不必要的联接。此外,为了提高准确性,给出了修剪策略以简化图形,以避免检查太多的弱链接表。在现实世界和综合数据库上的实验研究表明,该数据库具有很高的效率和良好的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号