首页>
外国专利>
Method and system for parallelization of ingestion of large data sets
Method and system for parallelization of ingestion of large data sets
展开▼
机译:大数据集摄取并行化的方法和系统
展开▼
页面导航
摘要
著录项
相似文献
摘要
Embodiments of the present invention relate to systems and methods for ingesting input data containing a plurality of records into a data lake. In an embodiment, the method comprises splitting the input data into a plurality of input splits consisting of a balanced number of records; reading the records from the plurality of input splits in parallel, regardless of the format and encoding of the input source; converting the input data within the records into at least one key/value pair; transforming the values input data into a serializable format; sorting the key/value pairs of the transformed values such that the records are sorted in the same order as they were read; writing the transformed values to an output file; and storing the output file to the data lake.
展开▼