首页> 外国专利> Method and computer program for statistical and data-mining processing of large data sets

Method and computer program for statistical and data-mining processing of large data sets

机译:大数据集统计和数据挖掘处理的方法和计算机程序

摘要

The method according to the present invention relates to the processing of large data files using statistical and data mining approaches. The method is characterized by that- the input data file having a predefined structure is subdivided into blocks containing an equal number of records (S100),- said blocks are consecutively processed thereby creating a local subresult for each block in the main memory that is built up of records having the same structure but different keys (S200),- the records of the local subresult are sorted according to a predefined principle (S300),- the current local subresult and the current global subresult created from all the previous local subresults are merged by iterating through the records of the local and global subresults once, and the new global subresult result is created on the storage device (S400), and finally- the previous global subresult is deleted from the background storage (S500), and if there are any blocks left for processing, the method returns to step S200.
机译:根据本发明的方法涉及使用统计和数据挖掘方法来处理大数据文件。该方法的特点是-将具有预定义结构的输入数据文件细分为包含相等记录数的块(S100),-连续处理所述块,从而为主存储器中的每个块创建本地子结果,该子结果由具有相同结构但键不同的记录构成(S200),-根据预定义的原则对本地子结果的记录进行排序(S300),-通过迭代遍历本地和全局子结果的记录一次来合并当前本地子结果和从所有先前的本地子结果创建的当前全局子结果,并在存储设备上创建新的全局子结果(S400),最后-从后台存储器中删除先前的全局子结果(S500),并且如果还有任何块要处理,则该方法返回到步骤S200。

著录项

  • 公开/公告号EP2256649A1

    专利类型

  • 公开/公告日2010-12-01

    原文格式PDF

  • 申请/专利权人 BUDAPESTI MÜSZAKI ES GAZDASAGTUDOMANYI EGYETEM;

    申请/专利号EP20090462004

  • 发明设计人 JUHÁSZ SÁNDOR;

    申请日2009-05-29

  • 分类号G06F17/30;

  • 国家 EP

  • 入库时间 2022-08-21 17:56:33

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号