首页>
外国专利>
METHOD FOR ESTABLISHING INDEX ON HDFS-BASED SPARK-SQL BIG-DATA PROCESSING SYSTEM
METHOD FOR ESTABLISHING INDEX ON HDFS-BASED SPARK-SQL BIG-DATA PROCESSING SYSTEM
展开▼
机译:在基于HDFS的SPARK-SQL大数据处理系统上建立索引的方法
展开▼
页面导航
摘要
著录项
相似文献
摘要
Provided is a method for establishing an index on an HDFS-based Spark-SQL big-data processing system; by means of a SQL statement, an index is added to, an index is deleted from, data is inserted into, and data is deleted from an HDFS-based Spark-SQL big-data processing system; when data is being queried, automatically determining whether a query column has an index; if so, then searching for a file block contained in the index and filtering out file blocks not needing to be searched. after adding index functionality to Spark-SQL, it is possible to effectively increase query speed; in the case of a typical Spark-SQL data table, the size is 1000 GB, each file stored taking up 1 GB, the 1000 GB being divided into 1000 files; if an individual record is queried, the original approach would require scanning 1000 files; after establishing the index, scanning one file suffices, thus efficiency is increased by 1000 times. Under typical circumstances, and in view of a conventional relational database experience, a Spark-SQL database having an established index performs queries at a speed 100-10,000 times faster, or more, than a SQL statement having no index.
展开▼