首页> 外文会议>ACM international conference on information and knowledge management >A Data Mining System Based on SQL Queries and UDFs for Relational Databases
【24h】

A Data Mining System Based on SQL Queries and UDFs for Relational Databases

机译:基于SQL查询和关系数据库的UDF的数据挖掘系统

获取原文
获取外文期刊封面目录资料

摘要

Most research on data mining has proposed algorithms and optimizations that work on flat files, outside a DBMS, mainly due to the following reasons. It is easier to develop efficient algorithms in a traditional programming language. The integration of data mining algorithms into a DBMS is difficult given its relational model foundation and system architecture. Moreover, SQL may be slow and cumbersome for numerical analysis computations. Therefore, data mining users commonly export data sets outside the DBMS for data mining processing, which creates a performance bottleneck and eliminates important data management capabilities such as query processing and security, among others (e.g. concurrency control and fault tolerance). With that motivation in mind, we developed a novel system based on SQL queries and User-Defined Functions (UDFs) that can directly analyze relational tables to compute statistical models, storing such models as relational tables as well. Most algorithms have been optimized to reduce the number of passes on the data set;. Our system can analyze large and high dimensional data sets faster than external data mining tools.
机译:大多数关于数据挖掘的研究都提出了在DBMS之外的平面文件上工作的算法和优化,主要原因是以下原因。更容易以传统的编程语言开发高效的算法。给出了其关系模型基础和系统架构,难以将数据挖掘算法集成到DBMS中。此外,对于数值分析计算,SQL可能是缓慢和繁琐的。因此,数据挖掘用户在DBMS之外的通常导出数据集以进行数据挖掘处理,其创建性能瓶颈,并消除了诸如查询处理和安全性的重要数据管理能力(例如,并发控制和容错)。通过考虑到这一动机,我们开发了一种基于SQL查询和用户定义的功能(UDFS)的新型系统,可以直接分析关系表来计算统计模型,也可以将这些模型存储为关系表。大多数算法已被优化以减少数据集上的通行证数量;我们的系统可以分析大而高维数据集比外部数据挖掘工具。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号