The Resource Description Framework (RDF) data model, is used on the Web to express billions of structured statements in a wide range of topics, including government, publications, life sciences, etc. Consequently, processing and storing this data requires the provision of high specification systems, both in terms of storage and computational capabilities. On the other hand, cloud-based big data services such as Google BigQuery can be used to store and query this data without any upfront investment. Google BigQuery pricing is based on the size of the data being stored or queried, but given that RDF statements contain long Uniform Resource Identifiers (URIs), the cost of query and storage of RDF big data can increase rapidly. In this paper we present and evaluate a novel and efficient dictionary compression algorithm which is faster, generates small dictionaries that can fit in memory and results in better compression rate when compared with other large scale RDF dictionary compression. Consequently, our algorithm also reduces the BigQuery storage and query cost
展开▼
机译:资源描述框架(RDF)数据模型在Web上用于表达广泛主题(包括政府,出版物,生命科学等)中的数十亿条结构化语句。因此,处理和存储此数据需要提供大量信息。在存储和计算能力方面的规范系统。另一方面,可以使用基于云的大数据服务(例如Google BigQuery)来存储和查询此数据,而无需任何前期投资。 Google BigQuery的定价基于要存储或查询的数据的大小,但是鉴于RDF语句包含较长的统一资源标识符(URI),因此查询和存储RDF大数据的成本可能会迅速增加。在本文中,我们提出并评估了一种新颖而有效的字典压缩算法,与其他大型RDF字典压缩相比,该算法速度更快,生成的小字典可以存储在内存中,并且压缩率更高。因此,我们的算法还减少了BigQuery的存储和查询成本
展开▼