首页> 外国专利> INPUT FORMAT FOR ANALYZING BINARY TYPE DATA IN HADOOP MAP REDUCE FOR THE DISTRIBUTED PROCESSING OF NUTCH AND BINARY DATA ANALYZING METHOD USING THE SAME

INPUT FORMAT FOR ANALYZING BINARY TYPE DATA IN HADOOP MAP REDUCE FOR THE DISTRIBUTED PROCESSING OF NUTCH AND BINARY DATA ANALYZING METHOD USING THE SAME

机译：用于Nupad分布处理的HADOOP MAP简化二进制类型数据的输入格式及使用相同方法的二进制数据分析方法

页面导航

摘要
著录项
相似文献

摘要

PURPOSE: An input format for analyzing binary type data in HADOOP MAP REDUCE and binary data analyzing method using the same are provided to process fixed length binary data in a Hadoop environment without a converting operation of a data format, thereby requiring a small storage space and realizing a rapid processing speed.;CONSTITUTION: A length of a record of binary data is received. InputSplit is defined by setting up a boundary between previous InputSplit and its InputSplit with the closest value to a block beginning point among points becoming a multiple of the length of the record in a data block to be processed among data blocks stored in HDFS(Hadoop Distributed File System) as the beginning point. A record reader reads a whole area of the InpuSplit from the beginning point as much as the length of the record.;COPYRIGHT KIPO 2012

机译：目的：提供一种用于在HADOOP MAP REDUCE中分析二进制类型数据的输入格式和使用该格式的二进制数据分析方法，以在Hadoop环境中处理固定长度的二进制数据，而无需进行数据格式的转换操作，从而需要较小的存储空间和构造：组成：接收到二进制数据记录的长度。通过以下方式定义InputSplit：在先前的InputSplit及其InputSplit之间设置一个边界，该边界的值最接近块起点，该点成为存储在HDFS中的数据块中要处理的数据块中记录长度的倍数（Hadoop分布式文件系统）作为起点。记录读取器从起点开始就读取InpuSplit的整个区域，长度与记录的长度一样多。; COPYRIGHT KIPO 2012

著录项

公开/公告号KR20120084100A

专利类型
公开/公告日2012-07-27

原文格式PDF
申请/专利权人 THE INDUSTRY & ACADEMIC COOPERATION IN CHUNGNAM NATIONAL UNIVERSITY (IAC);
展开▼

申请/专利号KR20110005424
发明设计人 LEE YOUNG SEOK;LEE YEON HEE;
展开▼

申请日2011-01-19
分类号G06F15/16;
国家 KR
入库时间 2022-08-21 17:09:28

相似文献

专利
外文文献
中文文献