首页> 外文会议>International Conference on Big Scientific Data Management >Multi-dimensional Index over a Key-Value Store for Semi-structured Data
【24h】

Multi-dimensional Index over a Key-Value Store for Semi-structured Data

机译:用于半结构数据的键值存储中的多维索引

获取原文

摘要

The informal data structures and trillions of data volume are the challenges for databases to store and retrieve semi-structured data. Most researchers deal with the issues through R-Tree, KD-tree and space curves, but these structures are not suitable for default and discrete values of semi-structured data, and even require sampling before storage. We present MD-Index, a scalable multi-dimensional indexing system that supports high-throughput and real-time range queries. MD-Index builds bitmap index of sliced data over a range partitioned Key-value store. The underlying Key-value store guarantees high throughput, large data storage, high availability and fault tolerance of the system, and bitmap provides multi-dimensional index of data. Meanwhile, MD-Index encodes the discrete values as the hash code of a slice, and stores the data and the bitmap of a slice in the same region (a storage unit of the range partitioned Key-value store) to utilize distributed computing and data locality. Our prototype of MD-Index is built on HBase, the standard Keyvalue database. Experimental results reveal that MD-Index is capable of storing and retrieving trillions of semi-structured data and achieving a throughput of two million records per second.
机译:非正式数据结构和万亿数据量是数据库存储和检索半结构化数据的挑战。大多数研究人员通过R树,KD树和空间曲线处理问题,但这些结构不适合半结构化数据的默认和离散值,甚至在存储前需要采样。我们呈现MD-Index,可扩展的多维索引系统,支持高吞吐量和实时范围查询。 MD-index在范围分区键值存储上构建切片数据的位图索引。底层键值存储保证了高吞吐量,大数据存储,高可用性和系统的容错,位图提供了多维数据索引。同时,MD-index将离散值进行分离值作为切片的散列码,并将数据和位图存储在同一区域中(范围分区键值存储的存储单元)以利用分布式计算和数据地点。我们的MD-Index的原型是基于HBase,标准keyValue数据库构建的。实验结果表明,MD指数能够存储和检索万亿个半结构数据,并实现每秒200万条记录的吞吐量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号