首页> 外文期刊>Journal of computer sciences >FROM DATA MINING AND KNOWLEDGE DISCOVERY TO BIG DATA ANALYTICS AND KNOWLEDGE EXTRACTION FOR APPLICATIONS IN SCIENCE
【24h】

FROM DATA MINING AND KNOWLEDGE DISCOVERY TO BIG DATA ANALYTICS AND KNOWLEDGE EXTRACTION FOR APPLICATIONS IN SCIENCE

机译:从数据挖掘和知识发现到大数据分析和知识提取,在科学中的应用

获取原文
获取原文并翻译 | 示例
           

摘要

"Data mining" for "knowledge discovery in databases" and associated computational operations first introduced in the mid-1990 s can no longer cope with the analytical issues relating to the so-called "big data". The recent buzzword big data refers to large volumes of diverse, dynamic, complex, longitudinal and/or distributed data generated from instruments, sensors, Internet transactions, email, video, click streams, noisy, structured/unstructured and/or all other digital sources available today and in the future at speeds and on scales never seen before in human history. The big data also being described using 3 Vs, volume, variety and velocity (with an additional 4th V for "veracity" and more recently with a 5th V for "value"), requires a set of new technologies, such as high performance computing i.e., exascale, architectures (distributed or grid), algorithms (for data clustering and generating association rules), programming languages, automated and scalable software tools, to uncover hidden patterns, unknown correlations and other useful information lately referred to as "actionable knowledge" or "data products" from the massive volumes of complex raw data. In view of the above facts, the paper gives an introduction to the synergistic challenges in "data-intensive" science and "exascale" computing for resolving "big data analytics" and "data science" issues in four main disciplines namely, computer science, computational science, statistics and mathematics. For the realisation of vital identified foundational aspects of an effective cyber infrastructure, basic problems need to be addressed adequately in the respective disciplines and are outlined. Finally, the paper looks at five scientific research projects that are urgently in need of high performance computing; this is in contrast to the earlier situations where private business enterprises were the drivers of better modern and faster technologies.
机译:1990年代中期首次引入的“用于数据库中的知识发现”的“数据挖掘”和相关的计算操作不再能够解决与所谓的“大数据”有关的分析问题。最近流行的大数据是指从仪器,传感器,互联网交易,电子邮件,视频,点击流,嘈杂的,结构化/非结构化和/或所有其他数字源生成的大量多样,动态,复杂,纵向和/或分布式数据现在和将来都可以以人类历史上从未见过的速度和规模来使用。还使用3 V,体积,种类和速度来描述大数据(对于“准确性”使用附加的第4 V,对于“值”附加使用第5 V),这需要一组新技术,例如高性能计算例如,百亿亿美元,体系结构(分布式或网格),算法(用于数据聚类和生成关联规则),编程语言,自动化和可伸缩的软件工具,以发现隐藏的模式,未知的相关性和其他有用的信息,这些信息最近被称为“可操作的知识”或来自大量复杂原始数据的“数据产品”。鉴于上述事实,本文介绍了“数据密集型”科学和“百亿亿级”计算在解决计算机科学,计算机科学,计算机科学,计算机科学,计算机科学,计算科学,统计学和数学。为了实现有效的网络基础架构中至关重要的,已确定的基础方面,需要在相应学科中充分解决基本问题并进行概述。最后,本文研究了五个迫切需要高性能计算的科学研究项目。这与早期情况相反,在早期情况下,私营企业是更好的现代技术和更快技术的驱动力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号