...
首页> 外文期刊>Knowledge and Information Systems >An Approximate Median Polish Algorithm for Large Multidimensional Data Sets
【24h】

An Approximate Median Polish Algorithm for Large Multidimensional Data Sets

机译:大型多维数据集的近似中值波兰语算法

获取原文
获取原文并翻译 | 示例

摘要

Exploratory data analysis is a widely used technique to determine which factors have the most influence on data values in a multi-way table, or which cells in the table can be considered anomalous with respect to the other cells. In particular, median polish is a simple yet robust method to perform exploratory data analysis. Median polish is resistant to holes in the table (cells that have no values), but it may require many iterations through the data. This factor makes it difficult to apply median polish to large multidimensional tables, since the I/O requirements may be prohibitive. This paper describes a technique that uses median polish over an approximation of a datacube, easing the burden of I/O. The cube approximation is achieved by fitting log-linear models to the data. The results obtained are tested for quality, using a variety of measures. The technique scales to large datacubes and proves to give a good approximation of the results that would have been obtained by median polish in the original data.
机译:探索性数据分析是一种广泛使用的技术,用于确定哪些因素对多向表中的数据值影响最大,或者该表中的哪些单元格相对于其他单元格可以认为是异常的。特别地,中值抛光是执行探索性数据分析的一种简单而强大的方法。中值抛光可抵抗表中的孔(无值的单元格),但可能需要对数据进行多次迭代。由于I / O要求可能令人望而却步,因此,很难对大型多维表应用中位数抛光。本文介绍了一种在数据立方体的近似值上使用中值抛光的技术,从而减轻了I / O的负担。立方近似是通过将对数线性模型拟合到数据来实现的。使用多种方法对获得的结果进行质量测试。该技术可以扩展到大型数据立方体,并证明可以很好地近似通过原始数据中值抛光获得的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号