The Need for Low Bias Algorithms in Classification Learning from Large Data Sets

机译：从大数据集中进行分类学习中的低偏差算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper reviews the appropriateness for application to large data sets of standard machine learning algorithms, which were mainly developed in the context of small data sets. Sampling and parallelisation have proved useful means for reducing computation time when learning from large data sets. However, such methods assume that algorithms that were designed for use with what are now considered small data sets are also fundamentally suitable for large data sets. It is plausible that optimal learning from large data sets requires a different type of algorithm to optimal learning from small data sets. This paper investigates one respect in which data set size may affect the requirements of a learning algorithm - the bias plus variance decomposition of classification error. Experiments show that learning from large data sets may be more effective when using an algorithm that places greater emphasis on bias management, rather than variance management.

机译：本文审查了适用于应用于大型数据集的标准机器学习算法，这些算法主要在小数据集的上下文中开发。在从大数据集中学习时，已经证明了采样和平行化已经证明了减少计算时间的有用手段。但是，这些方法假设被设计用于现在被认为是小数据集的算法也基本上适用于大数据集。从大数据集中的最佳学习是合理的，需要一种不同类型的算法来从小数据集最佳学习。本文调查了数据集大小可能影响学习算法的要求的一个尊重 - 分类误差的偏差加方差分解。实验表明，在使用算法时，从大数据集中学习可能更有效地更加强调偏差管理，而不是方差管理。

著录项

来源
《European Conference on Principles of Data Mining and Knowledge Discovery》|2002年||共12页
会议地点
作者
Damien Brain; Geoffrey I. Webb;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. Prostate cancer classification from prostate biomedical data using ant rough set algorithm with radial trained extreme learning neural network [J] . P.Mohamed Shakeel, Gunasekaran Manogaran Health and technology. . 2020,第1期

机译：采用径向训练极限神经网络从前列腺生物医学数据的前列腺癌分类从前列腺生物医学数据进行分类
2. Machine Learning Technique For Enhancing Classification Performance In Data Summarization Using Rough Set And Genetic Algorithm [J] . Merlinda Wibowo, Fiftin Noviyanto, Sarina Sulaiman, International Journal of Scientific & Technology Research . 2019,第10期

机译：粗糙集和遗传算法在数据汇总中提高分类性能的机器学习技术
3. Improving Meta-learning for Algorithm Selection by Using Multi-label Classification: A Case of Study with Educational Data Sets [J] . Luis Olmo Juan, Romero Cristobal, Gibaja Eva, International journal of computational intelligence systems . 2015,第6期

机译：通过使用多标签分类改善算法选择的元学习：以教育数据集为例
4. The Need for Low Bias Algorithms in Classification Learning from Large Data Sets [C] . Damien Brain, Geoffrey I. Webb 6th European Conference on Principles of Data Mining and Knowledge Discovery PKDD 2002, Aug 19-23, 2002, Helsinki, Finland . 2002

机译：从大数据集分类学习中对低偏差算法的需求
5. A Comparative Analysis of Selected Set of Natural Language Processing (NLP) and Machine Learning (ML) Algorithms for Clinical Coding using Clinical Classification Standards [D] . Kaur, Rajvir 2018

机译：使用临床分类标准对用于临床编码的自然语言处理（NLP）和机器学习（ML）算法的选择集进行比较分析
6. Lung nodule malignancy classification using only radiologist-quantified image features as inputs to statistical learning algorithms: probing the Lung Image Database Consortium dataset with two statistical learning methods [O] . Matthew C. Hancock, Jerry F. Magnan 2016

机译：仅使用放射科医生量化的图像特征作为统计学习算法的输入的肺结节恶性分类：使用两种统计学习方法探查肺图像数据库联盟数据集
7. The need for low bias algorithms in classification learning from large data sets [O] . Brain, Damien, Webb, Geoffrey I. 2002

机译：从大数据集分类学习中需要低偏差算法

The Need for Low Bias Algorithms in Classification Learning from Large Data Sets

摘要

著录项

相似文献

相关主题

期刊订阅