【24h】

Scientific Discovery and Rigor with ML

机译:科学发现和严谨与ml

获取原文

摘要

The evolution of Data Management Scenarios augmented by scientific discovery and rigor is apparent in the industry, judging by the sheer focus on it by analysts and others over the past couple of years. Machine Learning helps immensely playing its part in simplifying enterprise data landscapes, contributing to many aspects of Data Management. We see value in focusing on the Data Discovery and Data Quality aspects in this context, as enterprises these days have complex landscapes, with the average enterprise using more than 5 Cloud storages in addition to their on-prem data sources.A greater affinity for enterprise grade Machine Learning has created a significant pull for system design. This leads platforms towards capabilities like standard APIs for scaled-database queries and integration scenarios. This paper explores the integration of Machine Learning tools and customized libraries with any Cloud Platform for enhancing the stakeholders’ experience with Analytics. As far as concepts are concerned, we propose a hypothesis for scaling an existent platform to a community-based approach, which helps enable sharing of experimental iterations, ideally translating into industry specific solutions that should stay extremely reusable. The intent is to offer a data model flexible enough to handle diverse data scenarios, evaluating confidence scores for each of these. It should enable reproducible shared experiments with consistent evaluated scores, thereby easing the integration process through automated guidance. This paper will touch upon the good practices and architectural recommendations that need to be considered for general Machine Learning applications.
机译:通过科学发现和严格增强的数据管理情景的演变在行业中显而易见,通过分析师和过去几年的分析师和其他人来说,通过纯粹专注于它。机器学习有助于在简化企业数据景观中,对其进行大致播放,为数据管理的许多方面提供了贡献。我们看到重点关注在此上下文中的数据发现和数据质量方面,因为这些天的企业具有复杂的景观,除了在预级数据源之外,使用超过5个云存储的普通企业。对企业的更多亲和力等级机器学习创造了系统设计的显着拉动。这引发了朝向标准API的功能的平台,用于缩放数据库查询和集成方案。本文探讨了机器学习工具和自定义库与任何云平台的集成,以增强利益相关者的分析体验。就概念而言,我们提出了一个假设来扩展存在的平台,以实现基于社区的方法,这有助于实现实验迭代,理想地翻译成应保持最可重复使用的行业特定解决方案。意图是提供一个足够灵活的数据模型,以处理各种数据方案,评估每个数据的置信度分数。它应该使可重复的共享实验具有一致的评估分数,从而通过自动化引导来缓解积分过程。本文将触及良好的实践和建筑建议,需要考虑一般机器学习应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号