...
首页> 外文期刊>BioData Mining >PMLB: a large benchmark suite for machine learning evaluation and comparison
【24h】

PMLB: a large benchmark suite for machine learning evaluation and comparison

机译:PMLB:用于机器学习评估和比较的大型基准套件

获取原文
   

获取外文期刊封面封底 >>

       

摘要

The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists. The present study introduces an accessible, curated, and developing public benchmark resource to facilitate identification of the strengths and weaknesses of different machine learning methodologies. We compare meta-features among the current set of benchmark datasets in this resource to characterize the diversity of available data. Finally, we apply a number of established machine learning methods to the entire benchmark suite and analyze how datasets and algorithms cluster in terms of performance. From this study, we find that existing benchmarks lack the diversity to properly benchmark machine learning algorithms, and there are several gaps in benchmarking problems that still need to be considered. This work represents another important step towards understanding the limitations of popular benchmarking suites and developing a resource that connects existing benchmarking standards to more diverse and efficient standards in the future.
机译:基于目标问题和特定研究的目标,数据挖掘中机器学习方法的选择,开发或比较可能是一项艰巨的任务。从不同来源涌现出许多可公开获得的真实世界和模拟基准数据集,但是它们的组织和采用方式一直不一致。因此,选择和制定特定基准仍然是机器学习从业人员和数据科学家不必要的负担。本研究引入了一种可访问,策划和开发的公共基准资源,以帮助识别不同机器学习方法的优缺点。我们在此资源中当前基准数据集的集合之间比较元特征,以表征可用数据的多样性。最后,我们将许多已建立的机器学习方法应用于整个基准套件,并根据性能分析数据集和算法的聚类方式。从这项研究中,我们发现现有的基准测试缺乏适当地基准化机器学习算法的多样性,并且在基准测试问题中仍有一些空白需要考虑。这项工作代表了朝着了解流行基准套件的局限性发展的又一重要步骤,并开发了将现有基准标准与将来更多样化和更有效的标准联系起来的资源。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号