首页> 外国专利> INPUT FORMAT FOR ANALYZING BINARY TYPE DATA IN HADOOP MAP REDUCE FOR THE DISTRIBUTED PROCESSING OF NUTCH AND BINARY DATA ANALYZING METHOD USING THE SAME

INPUT FORMAT FOR ANALYZING BINARY TYPE DATA IN HADOOP MAP REDUCE FOR THE DISTRIBUTED PROCESSING OF NUTCH AND BINARY DATA ANALYZING METHOD USING THE SAME

机译:用于Nupad分布处理的HADOOP MAP简化二进制类型数据的输入格式及使用相同方法的二进制数据分析方法

摘要

PURPOSE: An input format for analyzing binary type data in HADOOP MAP REDUCE and binary data analyzing method using the same are provided to process fixed length binary data in a Hadoop environment without a converting operation of a data format, thereby requiring a small storage space and realizing a rapid processing speed.;CONSTITUTION: A length of a record of binary data is received. InputSplit is defined by setting up a boundary between previous InputSplit and its InputSplit with the closest value to a block beginning point among points becoming a multiple of the length of the record in a data block to be processed among data blocks stored in HDFS(Hadoop Distributed File System) as the beginning point. A record reader reads a whole area of the InpuSplit from the beginning point as much as the length of the record.;COPYRIGHT KIPO 2012
机译:目的:提供一种用于在HADOOP MAP REDUCE中分析二进制类型数据的输入格式和使用该格式的二进制数据分析方法,以在Hadoop环境中处理固定长度的二进制数据,而无需进行数据格式的转换操作,从而需要较小的存储空间和构造:组成:接收到二进制数据记录的长度。通过以下方式定义InputSplit:在先前的InputSplit及其InputSplit之间设置一个边界,该边界的值最接近块起点,该点成为存储在HDFS中的数据块中要处理的数据块中记录长度的倍数(Hadoop分布式文件系统)作为起点。记录读取器从起点开始就读取InpuSplit的整个区域,长度与记录的长度一样多。; COPYRIGHT KIPO 2012

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号