首页> 外国专利> SPARK DISTRIBUTED COMPUTING DATA PROCESSING METHOD AND SYSTEM

SPARK DISTRIBUTED COMPUTING DATA PROCESSING METHOD AND SYSTEM

机译:SPARK分布式计算数据处理方法及系统

摘要

The present invention relates to the field of computers, and provides a Spark distributed computing data processing method. The method comprises: scheduling a sub-task by means of a task scheduler, executing an RDD partition data storage task, and applying for a space of a storage area; calculating the size of an expellable space in the storage area, and setting a migration address of a hybrid storage system according to partition data access popularity (S102); and reading cached data in a specified storage area, releasing a corresponding memory space, migrating partition data to a specified address, modifying a persistence level of the migrated data, and feeding back an expelling success signal and expelled space information (S103). Also provided is a Spark distributed computing system. By introducing the hybrid storage system and designing an expelling logic unit and a cached data migration unit, the data is migrated to an SSD or an HDD according to the partition data popularity and is not directly migrated to a magnetic disk or the cached data is deleted, so that the pressure of memory space shortage can be effectively reduced and the Spark performance is improved.
机译:本发明涉及计算机领域,并提供了一种Spark分布式计算数据处理方法。该方法包括:通过任务调度器调度子任务,执行RDD分区数据存储任务,并申请存储区域的空间;计算存储区域中可驱逐空间的大小,并根据分区数据访问流行度设置混合存储系统的迁移地址(S102);读取指定存储区域中的缓存数据,释放相应的存储空间,将分区数据迁移到指定的地址,修改迁移后的数据的持久性级别,并反馈删除成功信号和删除空间信息(S103)。还提供了Spark分布式计算系统。通过引入混合存储系统并设计排除逻辑单元和缓存数据迁移单元,根据分区数据的流行度将数据迁移到SSD或HDD,而不直接迁移到磁盘或删除缓存数据,从而可以有效减少内存空间不足的压力,提高Spark性能。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号