首页>
外国专利>
SPARK DISTRIBUTED COMPUTING DATA PROCESSING METHOD AND SYSTEM
SPARK DISTRIBUTED COMPUTING DATA PROCESSING METHOD AND SYSTEM
展开▼
机译:SPARK分布式计算数据处理方法及系统
展开▼
页面导航
摘要
著录项
相似文献
摘要
The present invention relates to the field of computers, and provides a Spark distributed computing data processing method. The method comprises: scheduling a sub-task by means of a task scheduler, executing an RDD partition data storage task, and applying for a space of a storage area; calculating the size of an expellable space in the storage area, and setting a migration address of a hybrid storage system according to partition data access popularity (S102); and reading cached data in a specified storage area, releasing a corresponding memory space, migrating partition data to a specified address, modifying a persistence level of the migrated data, and feeding back an expelling success signal and expelled space information (S103). Also provided is a Spark distributed computing system. By introducing the hybrid storage system and designing an expelling logic unit and a cached data migration unit, the data is migrated to an SSD or an HDD according to the partition data popularity and is not directly migrated to a magnetic disk or the cached data is deleted, so that the pressure of memory space shortage can be effectively reduced and the Spark performance is improved.
展开▼