...
首页> 外文期刊>Expert Systems with Application >Effects of data set features on the performances of classification algorithms
【24h】

Effects of data set features on the performances of classification algorithms

机译:数据集特征对分类算法性能的影响

获取原文
获取原文并翻译 | 示例

摘要

As the need to analyze big data sets grows dramatically, the role that classification algorithms play in data mining techniques also increases. Big data analysis requires more of the data sets' characteristics to be included, such as data structure, variety of sources, and the rate of update frequency. In this paper, we evaluate scenarios that examine which data set characteristics most affect the classification algorithms' performance. It is still a complex issue to determine which algorithm is how strong or how weak in relation to which data set. Thus, our research experimentally examines how data set characteristics affect algorithm performance, both in terms of accuracy and in elapsed time. To do so, we use a multiple regression method to evaluate the causality between data set characteristics as independent variables, and performance metrics as dependent variables. We also examine the role that classification algorithms play as moderator in this causality. All benchmark data sets in a UCI database are used that are fit to run the classification algorithm. Based on the results of the experiment, we discuss the requirements of legacy classification algorithms to address big data analysis in a new business intelligence era.
机译:随着分析大数据集的需求急剧增长,分类算法在数据挖掘技术中的作用也日益增加。大数据分析要求包括更多数据集的特征,例如数据结构,各种来源和更新频率。在本文中,我们评估了一些场景,这些场景检查哪些数据集特征最会影响分类算法的性能。确定哪种算法相对于哪个数据集有多强还是有多弱,仍然是一个复杂的问题。因此,我们的研究实验性地检查了数据集特征如何在准确性和经过时间方面影响算法性能。为此,我们使用多元回归方法来评估数据集特征(作为独立变量)和性能指标(因变量)之间的因果关系。我们还研究了在这种因果关系中分类算法作为主持人的作用。使用UCI数据库中适合运行分类算法的所有基准数据集。基于实验的结果,我们讨论了传统分类算法在新的商业智能时代解决大数据分析的要求。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号