Building Statistical Models and Scoring with UDFs

机译：建立统计模型和与UDF的评分

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Multidimensional statistical models are generally computed outside a relational DBMS, exporting data sets. This article explains how fundamental multidimensional statistical models are computed inside the DBMS in a single table scan exploiting SQL and User-De?ned Functions (UDFs). The techniques described herein are used in a commercial data mining tool, called Teradata Warehouse Miner. Speci?cally, we explain how correlation, linear regression, PCA and clustering, are integrated into the Teradata DBMS. Two major database processing tasks are discussed: building a model and scoring a data set based on a model. To build a model two summary matrices are shown to be common and essential for all linear models: the linear sum of points and the quadratic sum of cross-products of points. Since such matrices are generally signi?cantly smaller than the data set, we explain how the remaining matrix operations to build the model can be quickly performed outside the DBMS. We ?rst explain how to efficiently compute summary matrices with plain SQL queries. Then we present two sets of UDFs that work in a single table scan: an aggregate UDF to compute summary matrices and a set of scalar UDFs to score data sets. Experiments compare UDFs and SQL queries (running inside the DBMS) with C++ (running outside on exported ?les). In general, UDFs are faster than SQL queries and UDFs are more efficient than C++, due to long export times. Statistical models based on the summary matrices can be built outside the DBMS in just a few seconds. Aggregate and scalar UDFs scale linearly and require only one table scan, making them ideal to process large data sets.

机译：多维统计模型通常在关系DBMS之外计算，导出数据集。本文介绍了在单个表扫描利用SQL和用户de？ned函数（UDFS）中的DBMS内计算基本多维统计模型。这里描述的技术用于商业数据挖掘工具，称为Teradata仓库矿工。 Speci？cally，我们解释了如何关联，线性回归，PCA和群集集成到Teradata DBMS中。讨论了两个主要数据库处理任务：构建模型并基于模型进行评分数据集。为了构建模型，两个汇总矩阵被示出为常见的，并且对于所有线性模型是必不可少的：点的线性和点的直线和点的二次总和。由于这种矩阵通常是Signi？总是小于数据集，因此我们解释了如何在DBMS之外快速执行构建模型的剩余矩阵操作。我们首先解释如何用普通的SQL查询有效地计算摘要矩阵。然后我们呈现两组在单个表扫描中工作的UDF：聚合UDF来计算摘要矩阵和一组标量UDF以进行评分数据集。实验比较UDFS和SQL查询（在DBMS内部运行）使用C ++（在导出的输出时运行）。通常，由于出口时间长时间，UDFS比SQL查询更快，UDF比C ++更有效。基于汇总矩阵的统计模型可以在短短几秒钟内在DBMS之外构建。汇总和标量UDFS线性缩放，只需要一个表扫描，使其成为处理大数据集的理想选择。

著录项

来源
《SIGMOD/PODS》|2007年||共12页
会议地点
作者
Carlos Ordonez;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.13;
关键词
DBMS; SQL; statistical model; UDf;

机译：DBMS;SQL;统计模型;UDF;

相似文献

外文文献
中文文献
专利

1. Statistical Model Computation with UDFs [J] . Ordonez Carlos Knowledge and Data Engineering, IEEE Transactions on . 2010,第12期

机译：使用UDF进行统计模型计算
2. Fast UDFs to compute sufficient statistics on large data sets exploiting caching and sampling [J] . Carlos Ordonez, Sasi K. Pitchaimalai Data & Knowledge Engineering . 2010,第4期

机译：快速UDF可利用缓存和采样来计算大型数据集的足够统计信息
3. Comparative analysis of the food webs of two intertidal mudflats during two seasons using inverse modelling: Aiguillon Cove and Brouage Mudflat, France [J] . Delphine Degre, Delphine Leguerrier, Eric Armynot du Chatelet, Estuarine Coastal and Shelf Science . 2006,第1a2期

机译：使用反模型对两个潮间带滩涂两个季节的食物网进行比较分析：Aiguillon Cove和Brouage Mudflat，法国
4. Building statistical models and scoring with UDFs [C] . Carlos Ordonez, PCarlos Ordonez ACM SIGMOD international conference on Management of data . 2007

机译：建立统计模型并使用UDF评分
5. Proudfoot and Bird, campus architects: Building facilities for professional education at the University of Iowa, 1898-1910 [D] . Eckhardt, Patricia Ann Lacey. 1990

机译：Proudfoot and Bird，校园建筑师：爱荷华大学的专业教育建筑设施，1898-1910年
6. Improving macromolecular atomic models at moderate resolution by automated iterative model building statistical density modification and refinement [O] . Thomas C. Terwilliger -1

机译：通过自动迭代模型构建统计密度修改和细化来改善中等分辨率的大分子原子模型
7. Building information modelling and integrated project delivery (BIM/IPD)- Key building design/construction conceptsudfor sustainability and lean project delivery [O] . Uwakonye O. 2016

机译：建筑信息建模和集成项目交付（BIM / IPD）-关键建筑设计/施工概念 ud可持续发展和精益项目交付

Building Statistical Models and Scoring with UDFs

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅