首页> 外文期刊>Journal of applied statistics >Generating multivariate continuous data via the notion of nearest neighbors
【24h】

Generating multivariate continuous data via the notion of nearest neighbors

机译:通过最近邻居的概念生成多元连续数据

获取原文
获取原文并翻译 | 示例
       

摘要

Taylor and Thompson [15] introduced a clever algorithm for simulating multivariate continuous data sets that resemble the original data. Their approach is predicated upon determining a few nearest neighbors of a given row of data through a statistical distance measure, and subsequently combining the observations by stochastic multipliers that are drawn from a uniform distribution to generate simulated data that essentially maintain the original data trends. The newly drawn values are assumed to come from the same underlying hypothetical process that governs the mechanism of how the data are formed. This technique is appealing in that no density estimation is required. We believe that this data-based simulation method has substantial potential in multivariate data generation due to the local nature of the generation scheme, which does not have strict specification requirements as in most other algorithms. In this work, we provide two R routines: one has a built-in simulator for finding the optimal number of nearest neighbors for any given data set, and the other generates pseudo-random data using this optimal number.
机译:泰勒和汤普森[15]介绍了一种聪明的算法,用于模拟类似于原始数据的多变量连续数据集。他们的方法基于通过统计距离度量确定给定数据行的几个最近邻居,然后通过从均匀分布得出的随机乘数来组合观察结果,以生成实质上保持原始数据趋势的模拟数据。假定新绘制的值来自控制数据形成机理的相同基础假设过程。该技术的吸引力在于不需要密度估计。我们相信,由于生成方案的局部性质,这种基于数据的仿真方法在多变量数据生成中具有巨大潜力,与大多数其他算法相比,该方案没有严格的规范要求。在这项工作中,我们提供了两个R例程:一个具有内置的模拟器,用于为任何给定的数据集找到最接近的邻居的最佳数目,而另一个使用此最佳数目生成伪随机数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号