OWLStats: Distributed Computation of OWL Dataset Statistics

机译：owlstats：owl数据集统计信息的分布式计算

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Nowadays, ontologies are used in various application areas, involving Artificial Intelligence, Natural Language Processing, Data Integration, and Knowledge Management. It is essential to know the internal structure, distribution, and coherence of the published datasets to make it easier to reuse, interlink, integrate, infer, or query. Therefore, there is a pressing need to obtain a clear view of OWL datasets became more prevalent. In this paper, we present OWLStats, a software component for computing statistical information about large scale OWL datasets in a distributed manner. We present the primary distributed in-memory approach for computing 32 different statistical criteria for OWL datasets utilizing Apache Spark, which can scale horizontally to a cluster of machines. OWLStats has been integrated into the SANSA framework. The preliminary results prove that OWLStats is linearly scalable in terms of data scalability.

机译：如今，在各种应用领域中使用了本体，涉及人工智能，自然语言处理，数据集成和知识管理。必须知道已发布的数据集的内部结构，分发和一致性，以便更容易地重用，互连，集成，推断或查询。因此，需要迫切需要获得猫头鹰数据集的清晰视图变得更加普遍。在本文中，我们呈现OWLSTATS，一种软件组件，用于以分布式方式计算有关大规模OWL数据集的统计信息。我们介绍了使用Apache Spark计算owl数据集的32个不同统计标准的主要分布式内存方法，它可以水平扩展到一组机器。 owlstats已集成到SANSA框架中。初步结果证明，在数据可扩展性方面，OWLSTATS是线性可扩展性。

著录项

来源
《International Joint Conference on Web Intelligence and Intelligent Agent Technology》|2020年|381-386|共6页
会议地点
作者
Heba Mohamed; Said Fathalla; Jens Lehmann; Hajira Jabeen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Scalability; OWL; Data integration; Pressing; Coherence; Ontologies; Software;

机译：可扩展性;猫头鹰;数据集成;按;连贯;本体;软件;
入库时间 2022-08-26 13:56:39

相似文献

外文文献
中文文献
专利

1. Statistically and Computationally Efficient Estimating Equations for Large Spatial Datasets [J] . Sun Ying, Stein Michael L. Journal of computational and graphical statistics: A joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America . 2016,第1期

机译：大空间数据集的统计和计算有效估计方程
2. Erratum to: Covariance tapering for interpolation of large spatial datasets(Journal of computational and graphical statistics, (2006), 15, (502-523)) (Erratum) [J] . Journal of computational and graphical statistics: A joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America . 2012,第3期

机译：勘误至：用于大型空间数据集插值的协方差渐缩（计算和图形统计学报，（2006），15，（502-523））（勘误）
3. Erratum and Addendum to: âCovariance Tapering for Interpolation of Large Spatial Datasetsâ published in the Journal of Computational and Graphical Statistics, 15, 502-523 [J] . Journal of Computational and Graphical Statistics . 2012,第3期

机译：勘误表和附录：“用于大空间数据集插值的协方差渐减”发表在《计算与图形统计杂志》上，第15卷，第502-523页
4. DistLODStats: Distributed Computation of RDF Dataset Statistics [C] . Gezim Sejdiu, Ivan Ermilov, Jens Lehmann, International semantic web conference . 2018

机译：DistLODStats：RDF数据集统计的分布式计算
5. Combinatorial Optimization on Massive Datasets: Streaming, Distributed, and Massively Parallel Computation [D] . Assadi, Sepehr. 2018

机译：大规模数据集的组合优化：流式，分布式和大规模并行计算
6. Secure and scalable deduplication of horizontally partitioned health data for privacy-preserving distributed statistical computation [O] . Kassaye Yitbarek Yigzaw, Antonis Michalas, Johan Gustav Bellika 2017

机译：对水平分区的健康数据进行安全且可扩展的重复数据删除以保护隐私的分布式统计计算
7. Communication-Efficient Computation on Distributed Noisy Datasets [O] . Qin Zhang 2015

机译：分布式嘈杂数据集的通信有效计算

OWLStats: Distributed Computation of OWL Dataset Statistics

摘要

著录项

相似文献

相关主题

期刊订阅