Efficient Distance Computation Using SQL Queries and UDFs

机译：使用SQL查询和UDF进行有效距离计算

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Distance computation is one of the most computationally intensive operations employed by many data mining algorithms. Performing such matrix computations within a DBMS creates many optimization challenges. We propose techniques to efficiently compute Euclidean distance using SQL queries and User-Defined Functions (UDFs). We concentrate on efficient Euclidean distance computation for the well-known K-means clustering algorithm. We present SQL query optimizations and a scalar UDF to compute Euclidean distance. We experimentally evaluate performance and scalability of our proposed SQL queries and UDF with large data sets on a modern DBMS. We benchmark distance computation on two important data mining techniques: clustering and classification. In general, UDFs are faster than SQL queries because they are executed in main memory. Data set size is the main factor impacting performance, followed by data set dimensionality.

机译：距离计算是许多数据挖掘算法采用的计算量最大的操作之一。在DBMS中执行此类矩阵计算会带来许多优化挑战。我们提出了使用SQL查询和用户定义函数（UDF）来有效计算欧几里得距离的技术。我们专注于针对著名的K均值聚类算法的高效欧几里得距离计算。我们提出了SQL查询优化和标量UDF来计算欧几里得距离。我们通过实验在现代DBMS上评估建议的SQL查询和具有大数据集的UDF的性能和可伸缩性。我们基于两种重要的数据挖掘技术对距离计算进行基准测试：聚类和分类。通常，UDF比SQL查询要快，因为它们在主内存中执行。数据集大小是影响性能的主要因素，其次是数据集维度。

著录项

来源
《Data Mining Workshops, ICDMW, 2008 IEEE International Conference on》||P.533-542|共10页
会议地点
作者
Pitchaimalai Sasi K.; Ordonez Carlos; Garcia-Alvarado Carlos;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类工业技术;
关键词
SQL; UDF; distance;

机译：SQL; UDF;距离;

相似文献

外文文献
中文文献
专利

1. Effects of SQL Query Inclusive Computation in Application Response Time [J] . Palanisamy A. M, Sangeetha S, Nataraj R. V Journal of computational and theoretical nanoscience . 2018,第5期

机译：SQL查询包容性计算在应用响应时间中的影响
2. Efficiently Translating Complex SQL Query to MapReduce Jobflow on Cloud [J] . Zhiang Wu, Aibo Song, Jie Cao, Cloud Computing, IEEE Transactions on . 2020,第2期

机译：有效地将复杂的SQL查询翻译为MapReduce Jobflow
3. Efficient querying of multidimensional RDF data with aggregates: Comparing NoSQL, RDF and relational data stores [J] . Ravat Franck, Song Jiefu, Teste Olivier, International Journal of Information Management . 2020,第Octa期

机译：高效查询聚集体的多维RDF数据：比较NoSQL，RDF和关系数据存储
4. Efficient Distance Computation Using SQL Queries and UDFs [C] . Pitchaimalai Sasi K., Ordonez Carlos, Garcia-Alvarado Carlos IEEE Interntional Conference on Data Mining Workshops . 2008

机译：使用SQL查询和UDFS的高效距离计算
5. Scalable Conversion of Textual Unstructured Data to NoSQL Graph Representation Using Berkeley DB Key-Value Store for Efficient Querying [D] . Varghese, Jasmine Manoj. 2017

机译：使用Berkeley DB键值存储将文本非结构化数据可扩展转换为NoSQL图形表示形式，以实现高效查询
6. Graph diffusion distance: Properties and efficient computation [O] . C. B. Scott, Eric Mjolsness, Gabriele Oliva, 2021

机译：图形扩散距离：属性和有效的计算
7. Querying NoSQL-based crowdsourcing systems efficiently [O] . Cuzzocrea Alfredo Massimiliano, Di Stefano Marcello, Fosci Paolo, 2016

机译：有效查询基于NoSQL的众包系统

Efficient Distance Computation Using SQL Queries and UDFs

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅