Interpretable decision-tree induction in a big data parallel framework

Abraham Itzhak Weinberg; Mark Last

首页> 外文期刊>International journal of applied mathematics and computer science >Interpretable decision-tree induction in a big data parallel framework

【24h】

Interpretable decision-tree induction in a big data parallel framework

机译：大数据并行框架中可解释的决策树归纳

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

When running data-mining algorithms on big data platforms, a parallel, distributed framework, such asMAPREDUCE, may be used. However, in a parallel framework, each individual model fits the data allocated to its own computing node without necessarily fitting the entire dataset. In order to induce a single consistent model, ensemble algorithms such as majority voting, aggregate the local models, rather than analyzing the entire dataset directly. Our goal is to develop an efficient algorithm for choosing one representative model from multiple, locally induced decision-tree models. The proposed SySM (syntactic similarity method) algorithm computes the similarity between the models produced by parallel nodes and chooses the model which is most similar to others as the best representative of the entire dataset. In 18.75% of 48 experiments on four big datasets, SySM accuracy is significantly higher than that of the ensemble; in about 43.75% of the experiments, SySM accuracy is significantly lower; in one case, the results are identical; and in the remaining 35.41% of cases the difference is not statistically significant. Compared with ensemble methods, the representative tree models selected by the proposed methodology are more compact and interpretable, their induction consumes less memory, and, as confirmed by the empirical results, they allow faster classification of new records.

机译：在大数据平台上运行数据挖掘算法时，可以使用并行的分布式框架，例如MAPREDUCE。但是，在并行框架中，每个单独的模型都适合分配给自己的计算节点的数据，而不必适合整个数据集。为了引入单个一致的模型，诸如多数投票之类的集成算法会汇总局部模型，而不是直接分析整个数据集。我们的目标是开发一种有效的算法，以从多个局部诱导的决策树模型中选择一个代表性模型。提出的SySM（句法相似度方法）算法计算并行节点生成的模型之间的相似度，并选择与其他模型最相似的模型作为整个数据集的最佳代表。在四个大数据集上进行的48个实验中，有18.75％的SySM准确性显着高于集合。在大约43.75％的实验中，SySM准确性显着降低;在一种情况下，结果是相同的;在其余35.41％的情况下，差异无统计学意义。与集成方法相比，通过所提出的方法选择的代表性树模型更加紧凑和可解释，它们的归纳消耗更少的内存，并且，如经验结果所证实，它们允许对新记录进行更快的分类。

著录项

来源
《International journal of applied mathematics and computer science》 |2017年第4期|共12页
作者
Abraham Itzhak Weinberg; Mark Last;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. INTERPRETABLE DECISION-TREE INDUCTION IN A BIG DATA PARALLEL FRAMEWORK [J] . Weinberg Abraham Itzhak, Last Mark International Journal of Applied Mathematics and Computer Science . 2017,第4期

机译：大数据并行框架中可解释的决策树诱导
2. Decision-tree induction to interpret lactation curves [J] . D. Pietersma, R. Lacroix, D. Lefebvre, Canadian Biosystems Engineering . 2002,第0期

机译：决策树归纳法解释泌乳曲线
3. Decision-tree induction to interpret lactation curves [J] . D. Pietersma, K.M. Wade, R. Lacroix, Canadian Biosystems Engineering . 2002,第2002期

机译：决策树归纳法解释泌乳曲线
4. Decision-tree Induction from Time-series Data Based on a Standard-example Split Test [C] . Yuu Yamada, Einoshin Suzuki, Hideto Yokoi, 20th International Conference on Machine Learning . 2003

机译：基于标准示例拆分测试的时间序列数据的决策树归纳
5. Knowledge discovery in databases with joint decision outcomes: A decision-tree induction approach. [D] . Chang, Namsik. 1995

机译：具有联合决策结果的数据库中的知识发现：决策树归纳方法。
6. Automatic design of decision-tree induction algorithms tailored to flexible-receptor docking data [O] . Rodrigo C Barros, Ana T Winck, Karina S Machado, 2012

机译：针对柔性接收器对接数据量身定制的决策树归纳算法的自动设计
7. Automatic design of decision-tree induction algorithms tailored to flexible-receptor docking data [O] . Barros Rodrigo C., Winck Ana T., Machado Karina S., 2012

机译：针对柔性接收器对接数据量身定制的决策树归纳算法的自动设计

Interpretable decision-tree induction in a big data parallel framework

摘要

著录项

相似文献

相关主题

期刊订阅