首页> 外文会议>Data Mining Workshops, ICDMW, 2008 IEEE International Conference on >Efficient Distance Computation Using SQL Queries and UDFs
【24h】

Efficient Distance Computation Using SQL Queries and UDFs

机译:使用SQL查询和UDF进行有效距离计算

获取原文
获取外文期刊封面目录资料

摘要

Distance computation is one of the most computationally intensive operations employed by many data mining algorithms. Performing such matrix computations within a DBMS creates many optimization challenges. We propose techniques to efficiently compute Euclidean distance using SQL queries and User-Defined Functions (UDFs). We concentrate on efficient Euclidean distance computation for the well-known K-means clustering algorithm. We present SQL query optimizations and a scalar UDF to compute Euclidean distance. We experimentally evaluate performance and scalability of our proposed SQL queries and UDF with large data sets on a modern DBMS. We benchmark distance computation on two important data mining techniques: clustering and classification. In general, UDFs are faster than SQL queries because they are executed in main memory. Data set size is the main factor impacting performance, followed by data set dimensionality.
机译:距离计算是许多数据挖掘算法采用的计算量最大的操作之一。在DBMS中执行此类矩阵计算会带来许多优化挑战。我们提出了使用SQL查询和用户定义函数(UDF)来有效计算欧几里得距离的技术。我们专注于针对著名的K均值聚类算法的高效欧几里得距离计算。我们提出了SQL查询优化和标量UDF来计算欧几里得距离。我们通过实验在现代DBMS上评估建议的SQL查询和具有大数据集的UDF的性能和可伸缩性。我们基于两种重要的数据挖掘技术对距离计算进行基准测试:聚类和分类。通常,UDF比SQL查询要快,因为它们在主内存中执行。数据集大小是影响性能的主要因素,其次是数据集维度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号