Real-Time Semantic Search Using Approximate Methodology for Large-Scale Storage Systems

Y. Hua; H. Jiang; D. Feng

首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Real-Time Semantic Search Using Approximate Methodology for Large-Scale Storage Systems

【24h】

Real-Time Semantic Search Using Approximate Methodology for Large-Scale Storage Systems

机译：大型存储系统使用近似方法的实时语义搜索

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The challenges of handling the explosive growth in data volume and complexity cause the increasing needs for semantic queries. The semantic queries can be interpreted as the correlation-aware retrieval, while containing approximate results. Existing cloud storage systems mainly fail to offer an adequate capability for the semantic queries. Since the true value or worth of data heavily depends on how efficiently semantic search can be carried out on the data in (near-) real-time, large fractions of data end up with their values being lost or significantly reduced due to the data staleness. To address this problem, we propose a near-real-time and cost-effective semantic queries based methodology, called FAST. The idea behind FAST is to explore and exploit the semantic correlation within and among datasets via correlation-aware hashing and manageable flat-structured addressing to significantly reduce the processing latency, while incurring acceptably small loss of data-search accuracy. The near-real-time property of FAST enables rapid identification of correlated files and the significant narrowing of the scope of data to be processed. FAST supports several types of data analytics, which can be implemented in existing searchable storage systems. We conduct a real-world use case in which children reported missing in an extremely crowded environment (e.g., a highly popular scenic spot on a peak tourist day) are identified in a timely fashion by analyzing 60 million images using FAST. FAST is further improved by using semantic-aware namespace to provide dynamic and adaptive namespace management for ultra-large storage systems. Extensive experimental results demonstrate the efficiency and efficacy of FAST in the performance improvements.

机译：处理数据量和复杂性爆炸性增长的挑战导致对语义查询的需求不断增加。语义查询可以解释为具有相关性的检索，同时包含近似结果。现有的云存储系统主要不能为语义查询提供足够的能力。由于数据的真实价值或价值在很大程度上取决于（近）实时对数据进行语义搜索的效率，因此大部分数据最终会由于数据陈旧而丢失或显着减少其值。为了解决这个问题，我们提出了一种基于实时且经济高效的语义查询的方法，称为FAST。 FAST背后的想法是通过感知相关的哈希和可管理的平面结构化寻址来探索和利用数据集中的语义相关性，以显着减少处理延迟，同时导致数据搜索精度损失不大。 FAST的近实时属性可以快速识别相关文件，并显着缩小要处理的数据范围。 FAST支持多种类型的数据分析，可以在现有的可搜索存储系统中实施。我们进行了一个真实的用例，其中使用FAST分析了6000万张图像，从而及时识别出报告的儿童在极端拥挤的环境中失踪（例如，在旅游高峰日的热门景点）。通过使用可感知语义的名称空间为超大型存储系统提供动态和自适应名称空间管理，进一步改善了FAST。大量的实验结果证明了FAST在性能改进方面的效率和功效。

著录项

来源
《IEEE Transactions on Parallel and Distributed Systems》 |2016年第4期|1212-1225|共14页
作者
Y. Hua; H. Jiang; D. Feng;
展开▼
作者单位

Yu Hua is with the Wuhan National Laboratory for Optoelectronics, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China.(Email: csyhua@hust.edu.cn);

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Cloud storage; data analytics; real-time performance; semantic correlation;

机译：云存储;数据分析;实时性能;语义关联;

相似文献

外文文献
中文文献
专利

1. QoS Based Ranking Methodology for Semantic Search System [J] . Dr. S. Vijayalakshmi, K. Palaniammal Australian Journal of Basic and Applied Sciences . 2014,第2014期

机译：基于QoS的语义搜索系统排序方法。
2. The large-scale parallel full text search using the storage system with information retrieval functionality [J] . Mitsunori Kori, Yoshinori Yamagishi, Hidehiro Shimizu, 電子情報通信学会技術研究報告. コンピュ-タシステム. Computer Systems . 2002,第276期

机译：使用具有信息检索功能的存储系统进行大规模并行全文本搜索
3. The large-scale parallel full text search using the storage system with information retrieval functionality [J] . Mitsunori Kori, Yoshinori Yamagishi, Hidehiro Shimizu, 電子情報通信学会技術研究報告. コンピュ-タシステム. Computer Systems . 2002,第276期

机译：使用具有信息检索功能的存储系统进行大规模并行全文搜索
4. On Semantic Solutions for Efficient Approximate Similarity Search on Large-Scale Datasets [C] . Alexander Ocsa, Jose Luis Huillca, Cristian Lopez del Alamo Iberoamerican congress on pattern recognition . 2018

机译：大规模数据集上有效近似相似搜索的语义解决方案
5. Unsupervised Binary Code Learning for Approximate Nearest Neighbor Search in Large-scale Datasets. [D] . Zhang, Hao. 2016

机译：大规模数据集中近似邻居搜索的无监督二进制代码学习。
6. Privacy-Aware Relevant Data Access with Semantically Enriched Search Queries for Untrusted Cloud Storage Services [O] . Zeeshan Pervez, Mahmood Ahmad, Asad Masood Khattak, 2011

机译：具有语义丰富的搜索查询的隐私相关数据访问，用于不受信任的云存储服务
7. Real-Time Semantic Search Using Approximate Methodology for Large-Scale Storage Systems [O] . Yu Hua, Hong Jiang, Dan Feng 2016

机译：使用大型存储系统的近似方法实时语义搜索

Real-Time Semantic Search Using Approximate Methodology for Large-Scale Storage Systems

摘要

著录项

相似文献

相关主题

期刊订阅