【24h】

Statistical Properties of Transactional Databases

机译:事务数据库的统计属性

获取原文
获取原文并翻译 | 示例

摘要

Most of the complexity of common data mining tasks is due to the unknown amount of information contained in the data being mined. The more patterns and correlations are contained in such data, the more resources are needed to extract them. This is confirmed by the fact that in general there is not a single best algorithm for a given data mining task on any possible kind of input dataset. Rather, in order to achieve good performances, strategies and optimizations have to be adopted according to the dataset specific characteristics. For example one typical distinction in transactional databases is between sparse and dense datasets. In this paper we consider Frequent Set Counting as a case study for data mining algorithms. We propose a statistical analysis of the properties of transactional datasets that allows for a characterization of the dataset complexity. We show how such characterization can be used in many fields, from performance prediction to optimization.
机译:常见数据挖掘任务的大多数复杂性是由于要挖掘的数据中包含的信息量未知。此类数据中包含的模式和相关性越多,提取它们所需的资源就越多。通常,对于任何可能种类的输入数据集,对于给定的数据挖掘任务来说,没有一个单一的最佳算法可以证实这一点。相反,为了获得良好的性能,必须根据数据集的特定特征采用策略和优化。例如,事务数据库中的一个典型区别是稀疏数据集和密集数据集之间的区别。在本文中,我们将“频繁集计数”作为数据挖掘算法的案例研究。我们建议对事务性数据集的属性进行统计分析,以表征数据集的复杂性。我们展示了这种表征如何在从性能预测到优化的许多领域中使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号