首页> 外文学位 >Non-parametric density estimation of streaming data using orthogonal series.
【24h】

Non-parametric density estimation of streaming data using orthogonal series.

机译:使用正交序列的流数据的非参数密度估计。

获取原文
获取原文并翻译 | 示例

摘要

Computer technology in the 21st century has allowed us to gather and collect data at rates that would have seemed impossible less than a decade ago. As such, typical data base management systems (DBMS) are having great difficulty storing and analyzing data in the traditional way. Systems that receive large amounts of data in transient data streams generally need to analyze the data immediately without storing it on a disk. These systems are referred to as data stream management systems (DSMS). This emerging field has been pushed to the forefront by technology that demands analysis of data in real time. Babcock et al. [2002] analyzed the issues involved in mining rapid time-varying data streams. To date, most of the work in the area of DSMS has primarily been concerned with querying the data streams. These queries provide estimates of parameters, such as the mean, and then continuously update them as more data arrives. Recently, Heinz and Seeger [2004] used data streams to provide an estimate of the underlying probability density function by dividing the data up into bins or windows containing the most recent data. An estimate of the density is then created by using the standard wavelet cascading algorithm on the binned data.; This dissertation will provide an alternative approach to finding the probability density function of streaming data. This approach provides an estimate of the density by using an orthogonal series. Obtaining a density estimate by orthogonal series has several advantages which will be discussed throughout this dissertation. Although the approach is applicable to a myriad of basis functions, the density estimation problem will be studied by using wavelets as the basis functions. The history of wavelets as a mathematical tool dates back to the early 1900s. In the 1990s, Donoho and Johnstone [1992,1994] really established wavelets as a scientific discipline by applying them in the areas of image compression, denoising and density estimation. Devroye [1985], Silverman [1986] and Scott [1992] provide excellent background material on density estimation in general. The first paper that used wavelets in density estimation is attributed to Doukhan and Leon [1990]. This work was followed by Walter [1990] and Kerkyacharian and Picard [1992]. As a mathematical tool for representing functions, and specifically probability densities, wavelets work especially well. This is due in part, to the fact that they form an orthonormal basis for L2R . Another pioneer in the field of wavelet density estimation was Vidakovic [1994], who constructed density estimations based on the square root of the density.; This dissertation will first provide a history of wavelets and the density estimation problem in Chapter 2. Next, in Chapter 3, the framework for obtaining a density estimate of streaming data using orthogonal series will be established. In Chapter 4, I will address the problem of discounting old data that is no longer relevant to the density estimate. Chapter 5 provides a simulation study first using simulated data, and then actual data from a case study using Internet header traffic data. Chapter 6 will summarize my findings as well as address possible areas of future study.
机译:21世纪的计算机技术使我们能够以十年前似乎不可能的速度收集和收集数据。这样,典型的数据库管理系统(DBMS)很难以传统方式存储和分析数据。在瞬态数据流中接收大量数据的系统通常需要立即分析数据而不将其存储在磁盘上。这些系统称为数据流管理系统(DSMS)。要求实时数据分析的技术已将这一新兴领域推到了最前沿。 Babcock等。 [2002]分析了挖掘快速时变数据流中涉及的问题。迄今为止,DSMS领域中的大多数工作主要与查询数据流有关。这些查询提供参数的估计值(例如平均值),然后随着更多数据的到达而不断更新它们。最近,Heinz和Seeger [2004]使用数据流通过将数据划分为包含最新数据的bin或window来提供潜在概率密度函数的估计。然后通过使用标准小波级联算法对合并后的数据创建密度的估计。本文将为寻找流数据的概率密度函数提供一种替代方法。该方法通过使用正交序列来提供密度的估计。通过正交级数获得密度估计具有几个优点,将在整个本文中进行讨论。尽管该方法适用于众多的基函数,但是将通过使用小波作为基函数来研究密度估计问题。小波作为数学工具的历史可以追溯到1900年代初。在1990年代,Donoho和Johnstone [1992,1994]通过将小波应用于图像压缩,去噪和密度估计领域,将小波真正地确立为一门科学学科。 Devroye [1985],Silverman [1986]和Scott [1992]通常为密度估计提供了出色的背景材料。将小波用于密度估计的第一篇论文归因于Doukhan和Leon [1990]。 Walter(1990)以及Kerkyacharian和Picard(1992)紧随其后。小波作为一种表示函数(特别是概率密度)的数学工具,效果特别好。这部分是由于它们构成L2R的正交基础这一事实。小波密度估计领域的另一位先驱是Vidakovic [1994],他根据密度的平方根构造了密度估计。本文将在第二章中首先提供小波的历史和密度估计问题。接下来,在第三章中,将建立使用正交序列获得流数据密度估计的框架。在第4章中,我将解决与密度估计不再相关的旧数据折价问题。第5章首先使用模拟数据提供模拟研究,然后使用Internet标头流量数据从案例研究中获得实际数据。第6章将总结我的发现,并讨论未来研究的可能领域。

著录项

  • 作者

    Caudle, Kyle A.;

  • 作者单位

    George Mason University.;

  • 授予单位 George Mason University.;
  • 学科 Statistics.; Computer Science.
  • 学位 Ph.D.
  • 年度 2005
  • 页码 147 p.
  • 总页数 147
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 统计学;自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号