首页> 外文会议>IEEE International Symposium on Workload Characterization >Storage Characterization for Unstructured Data in Online Services Applications
【24h】

Storage Characterization for Unstructured Data in Online Services Applications

机译:在线服务应用中的非结构化数据存储特性

获取原文

摘要

Mega datacenters hosting large scale web services have unique workload attributes that need to be taken into account for optimal service scalability. Provisioning compute and storage resources to provide a seamless user experience is challenging since customer traffic loads vary widely across time and geographies, and the servers hosting these applications have to be rightsized to provide both performance within a single server and across a scale-out cluster. Typical user-facing web services have a three tiered hierarchy - front-end web servers, middle-tier application logic, and back-end data storage and processing layer. In this paper, we address the challenge of disk subsystem design for back-end servers hosting large amounts of unstructured (also called blob) data. Examples of typical content hosted on such servers include user generated content such as photos, email messages, videos, and social networking updates. Specific server applications analyzed in this paper correspond to the message store of a large scale email application, image tile storage for a large scale geo-mapping application, and user content storage for Web 2.0 type applications. We analyze the storage subsystems for these web services in a live production environment and provide an overview of the disk traffic patterns and access characteristics for each of these applications. We then explore time-series characteristics and derive probabilistic models showing state transitions between locations on the data volumes for these applications. We then explore how these probabilistic models could be extended into a framework for synthetic benchmark generation for such applications. Finally, we discuss how this framework can be used for storage subsystem rightsizing for optimal scalability of such backend storage clusters.
机译:托管大型Web服务的Mega数据中心具有唯一的工作负载属性,以获得最佳的服务可扩展性。供应计算和存储资源以提供无缝用户体验是具有挑战性,因为客户流量负载跨时间和地理位置的广泛变化,并且托管这些应用程序的服务器必须被赋予,以便在单个服务器中提供两种横向群集的性能。典型的用户面向用户的Web服务具有三个分层的层次结构 - 前端Web服务器,中间层应用程序逻辑和后端数据存储和处理层。在本文中,我们解决了托管大量非结构化(也称为BLOB)数据的后端服务器磁盘子系统设计的挑战。在此类服务器上托管的典型内容的示例包括用户生成的内容,例如照片,电子邮件,视频和社交网络更新。本文分析的特定服务器应用程序对应于大型电子邮件应用程序的消息存储,用于大规模地理映射应用的大规模电子邮件应用程序,以及Web 2.0类型应用程序的用户内容存储。我们在实时生产环境中分析了这些Web服务的存储子系统,并提供了每个应用程序的磁盘流量模式和访问特征的概述。然后,我们探讨了时序特征和推导概率模型,显示了这些应用程序的数据卷上的位置之间的状态转换。然后,我们探讨这些概率模型如何扩展到这种应用的合成基准生成框架中。最后,我们讨论该框架如何用于存储子系统,以获得这种后端存储群集的最佳可扩展性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号