首页> 外文OA文献 >Representing and querying regression models in a relational database management system
【2h】

Representing and querying regression models in a relational database management system

机译:在关系数据库管理系统中表示和查询回归模型

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Curve fitting is a widely employed, useful modeling tool in several financial, scientific, engineering and data mining applications, and in applications like sensor networks that need to tolerate missing or noisy data. These applications need to both fit functions to their data using regression, and pose relational-style queries over regression models. Unfortunately, existing DBMSs are ill suited for this task because they do not include support for creating, representing and querying functional data, short of brute-force discretization of functions into a collection of tuples. This thesis describes FunctionDB, a novel DBMS that extends the state of the art. FunctionDB treats functions output by regression as first-class citizens that can be queried declaratively and manipulated like traditional database relations. The key contributions of FunctionDB are a compact, algebraic representation for regression models as piecewise functions, and an algebraic query processor that executes declarative queries directly on this representation as combinations of algebraic operations like function inversion, zero finding and symbolic integration. FunctionDB is evaluated on two real world data sets: measurements from a temperature sensor network, and traffic traces from cars driving on Boston roads. The results show that operating in the functional domain has substantial accuracy advantages (over 15% for some queries) and order of magnitude (10x-100x) performance gains over existing approaches that represent models as discrete collections of points. The thesis also describes an algorithm to maintain regression models online, as new raw data is inserted into the system. The algorithm supports a sustained insertion rate of the order of a million records per second, while generating models no less compact than a clairvoyant (offline) strategy.
机译:曲线拟合是广泛应用于金融,科学,工程和数据挖掘应用以及需要容忍丢失或嘈杂数据的传感器网络等应用的有用建模工具。这些应用程序既需要使用回归使函数适合其数据,又需要对回归模型进行关系样式的查询。不幸的是,现有的DBMS不适合此任务,因为它们不支持创建,表示和查询功能数据,而没有将功能强行离散化为元组的功能。本文介绍了FunctionDB,这是一种扩展现有技术的新颖DBMS。 FunctionDB将通过回归输出的函数视为一流公民,可以像传统数据库关系一样进行声明式查询和操纵。 FunctionDB的主要贡献是作为分段函数的紧凑模型的代数表示形式,以及作为代数运算(例如函数求反,归零和符号集成)的组合直接在此表示上执行声明性查询的代数查询处理器。 FunctionDB在两个真实世界的数据集上进行了评估:来自温度传感器网络的测量值以及来自在波士顿道路上行驶的汽车的交通轨迹。结果表明,与将模型表示为点的离散集合的现有方法相比,在功能域中进行操作具有显着的准确性优势(某些查询超过15%)和数量级(10x-100x)的性能提升。本文还描述了一种在新的原始数据插入系统中时在线维护回归模型的算法。该算法支持每秒百万条记录的持续插入速率,同时生成的模型不比千里眼(离线)策略小巧。

著录项

  • 作者

    Thiagarajan Arvind;

  • 作者单位
  • 年度 2007
  • 总页数
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号