首页> 外文期刊>Future generation computer systems >Understanding flows in high-speed scientific networks: A Netflow data study
【24h】

Understanding flows in high-speed scientific networks: A Netflow data study

机译:了解高速科学网络中的流量:Netflow数据研究

获取原文
获取原文并翻译 | 示例
       

摘要

Complex science workflows involve very large data demands and resource-intensive computations. These demands need reliable high-speed networks, that can optimize performance for application data flows. Characterizing flows into large flows (elephant) versus small flows (mice) can allow networks to optimize performance by detecting and handling demands in real-time. However, predicting elephant versus mice flows is extremely difficult as their definition varies based on networks.Machine learning techniques can help classify flows into two distinct clusters to identify characteristics of transfers. In this paper, we investigate unsupervised and semi-supervised machine learning approaches to classify flows in real time. We develop a Gaussian Mixture Model combined with an initialization algorithm, to develop a novel general-purpose method to help classification based on network sites (in terms of data transfers, flow rates and durations). Our results show that the proposed algorithm is able to cluster elephants and mice with an accuracy rate of 90%. We analyzed NetFlow reports of 1 month from 3 ESnet site routers to train the model and predict clusters. (C) 2018 Elsevier B.V. All rights reserved.
机译:复杂的科学工作流程涉及非常大的数据需求和资源密集型计算。这些需求需要可靠的高速网络,该网络可以优化应用程序数据流的性能。将流量分为大流量(大象)与小流量(小鼠),可以使网络通过实时检测和处理需求来优化性能。但是,由于大象和老鼠的定义会因网络而异,因此预测大象与老鼠的流量非常困难。机器学习技术可以帮助将流量分为两个不同的类群来识别传输的特征。在本文中,我们研究了非监督和半监督机器学习方法,以实时对流进行分类。我们开发了一种结合初始化算法的高斯混合模型,以开发一种新颖的通用方法来帮助基于网络站点的分类(就数据传输,流率和持续时间而言)。我们的结果表明,提出的算法能够以90%的准确率对大象和小鼠进行聚类。我们分析了来自3个ESnet站点路由器的1个月的NetFlow报告,以训练模型和预测群集。 (C)2018 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号