首页> 外文会议>International conference on very large data bases >Feature Selection in Enterprise Analytics: A Demonstration using an R-based Data Analytics System
【24h】

Feature Selection in Enterprise Analytics: A Demonstration using an R-based Data Analytics System

机译:企业分析中的功能选择:使用基于R的数据分析系统的演示

获取原文

摘要

Enterprise applications are analyzing ever larger amounts of data using advanced analytics techniques. Recent systems from Oracle, IBM, and SAP integrate R with a data processing system to support richer advanced analytics on large data. A key step in advanced analytics applications is feature selection, which is often an iterative process that involves statistical algorithms and data manipulations. From our conversations with data scientists and analysts at enterprise settings, we observe three key aspects about feature selection. First, feature selection is performed by many types of users, not just data scientists. Second, high performance is critical to perform feature selection processes on large data. Third, the provenance of the results and steps in feature selection processes needs to be tracked for purposes of transparency and auditability. Based on our discussions with data scientists and the literature on feature selection practice, we organize a set of operations for feature selection into the Columbus framework. We prototype Columbus as a library usable in the Oracle R Enterprise environment. In this demonstration, we use Columbus to showcase how we can support various types of users of feature selection in one system. We then show how we optimize performance and manage the provenance of feature selection processes.
机译:企业应用程序正在使用高级分析技术分析更大的数据。最近来自Oracle,IBM和SAP的系统与数据处理系统集成了R,以支持大数据上的更丰富的高级分析。高级分析应用程序的一个关键步骤是特征选择,这通常是涉及统计算法和数据操作的迭代过程。从我们的对话与数据科学家和分析师在企业设置中,我们观察有关特征选择的三个关键方面。首先,特征选择由许多类型的用户执行,而不仅仅是数据科学家。其次,高性能对于在大数据上执行特征选择过程至关重要。第三,需要跟踪结果和特征选择流程中的步骤的出差,以便透明和审计性的目的进行跟踪。根据我们与数据科学家的讨论和特征选择实践的文献,我们为哥伦布框架组织了一系列操作。我们将哥伦布原型作为可用于Oracle R Enterprise环境的库。在此演示中,我们使用哥伦布来展示我们如何在一个系统中支持各种类型的特征选择用户。然后,我们展示我们如何优化性能并管理功能选择过程的出处。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号