...
【24h】

Efficient data reduction in multimedia data

机译:多媒体数据中的有效数据缩减

获取原文
获取原文并翻译 | 示例

摘要

As the amount of multimedia data is increasing day-by-day thanks to cheaper storage devices and increasing number of information sources, the machine learning algorithms are faced with large-sized datasets. When original data is huge in size small sample sizes are preferred for various applications. This is typically the case for multimedia applications. But using a simple random sample may not obtain satisfactory results because such a sample may not adequately represent the entire data set due to random fluctuations in the sampling process. The difficulty is particularly apparent when small sample sizes are needed. Fortunately the use of a good sampling set for training can improve the final results significantly. In KDD'03 we proposed EASE that outputs a sample based on its 'closeness' to the original sample. Reported results show that EASE outperforms simple random sampling (SRS). In this paper we propose EASIER that extends EASE in two ways. (1) EASE is a halving algorithm, i.e., to achieve the required sample ratio it starts from a suitable initial large sample and iteratively halves. EASIER, on the other hand, does away with the repeated halving by directly obtaining the required sample ratio in one iteration. (2) EASE was shown to work on IBM QUEST dataset which is a categorical count data set. EASIER, in addition, is shown to work on continuous data of images and audio features. We have successfully applied EASIER to image classification and audio event identification applications. Experimental results show that EASIER outperforms SRS significantly.
机译:由于廉价的存储设备和越来越多的信息源,多媒体数据的数量每天都在增加,因此机器学习算法面临着大型数据集。当原始数据量巨大时,小样本量适用于各种应用。多媒体应用通常是这种情况。但是使用简单的随机样本可能无法获得令人满意的结果,因为由于采样过程中的随机波动,此类样本可能无法充分代表整个数据集。当需要小样本量时,困难尤其明显。幸运的是,使用良好的样本集进行训练可以显着改善最终结果。在KDD'03中,我们提出了EASE,它根据与原始样本的“接近度”输出样本。报告的结果表明,EASE优于简单随机抽样(SRS)。在本文中,我们提出了以两种方式扩展EASE的EASIER。 (1)EASE是一种减半算法,即为了实现所需的采样率,它从合适的初始大样本开始,然后迭代减半。另一方面,EASIER通过一次迭代直接获得所需的采样率来消除重复的一半。 (2)已证明EASE可用于IBM QUEST数据集,该数据集是分类计数数据集。此外,显示EASIER可处理图像和音频特征的连续数据。我们已经成功地将EASIER应用于图像分类和音频事件识别应用程序。实验结果表明,EASIER的性能明显优于SRS。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号