首页> 外文期刊>SIGKDD explorations >Extending the Naive Bayes Model Element in PMML: Adding Support for Continuous Input Variables

Extending the Naive Bayes Model Element in PMML: Adding Support for Continuous Input Variables


获取原文并翻译 | 示例


The Predictive Model Markup Language (PMML) is the de facto standard to represent data mining and predictive analytic models. With PMML, one can easily share a predictive solution among PMML-compliant applications and systems. PMML as a standard has evolved significantly over the years. PMML 4.1, the language's latest version represents a major leap forward in terms of its ability to represent data post-processing and multiple models. It also provides entirely new model elements for supporting Scorecards and K-Nearest Neighbors. The same is no exception for PMML 4.2, currently being worked on by the Data Mining Group (DMG), the body responsible for maintaining and advancing the PMML standard. PMML 4.2 is bound to offer new elements and increased capabilities. This article describes one of such improvement. In particular, it proposes extending the existing model element for Naive Bayes Classifiers to support continuous input fields. The R Project is a popular choice for data miners to analyze and build predictive models. Naive Bayes is just one of a myriad of model types supported by R. The R e1071 package provides a naiveBayes function to build Naive Bayes Models using categorical as well as continuous fields. The R pmml package has been recently extended to allow for the export of PMML code for objects built with the na?ve Bayes function. For now, it includes a PMML Extension element for continuous fields, but with the release of PMML 4.2, the support will be standardized. This article describes this process in view of our proposal to extend the current model element for Naive Bayes Models.
机译:预测模型标记语言(PMML)是表示数据挖掘和预测分析模型的事实上的标准。使用PMML,可以轻松地在符合PMML的应用程序和系统之间共享一种预测性解决方案。多年来,PMML作为一种标准已经有了长足的发展。该语言的最新版本PMML 4.1在表示数据后处理和多种模型的能力方面代表了重大的飞跃。它还提供了全新的模型元素来支持记分卡和K最近邻居。数据挖掘小组(DMG)目前正在研究PMML 4.2,这也是一个例外,该组织负责维护和推进PMML标准。 PMML 4.2必然会提供新的元素和增强的功能。本文介绍了这种改进之一。特别是,它建议扩展朴素贝叶斯分类器的现有模型元素以支持连续输入字段。 R Project是数据挖掘人员分析和构建预测模型的流行选择。朴素贝叶斯只是R支持的众多模型类型之一。R e1071软件包提供了naiveBayes函数,可以使用分类字段和连续字段来构建朴素贝叶斯模型。 R pmml软件包最近得到扩展,可以导出使用朴素贝叶斯函数构建的对象的PMML代码。目前,它包括用于连续字段的PMML扩展元素,但是随着PMML 4.2的发布,支持将被标准化。本文根据我们为Naive Bayes模型扩展当前模型元素的提议来描述此过程。



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号