首页> 外文会议>International Semantic Web Conference >Optimize First, Buy Later: Analyzing Metrics to Ramp-Up Very Large Knowledge Bases
【24h】

Optimize First, Buy Later: Analyzing Metrics to Ramp-Up Very Large Knowledge Bases

机译:首先优化,稍后购买:分析度量标准以增加非常大的知识库

获取原文

摘要

As knowledge bases move into the landscape of larger ontologies and have terabytes of related data, we must work on optimizing the performance of our tools. We are easily tempted to buy bigger machines or to fill rooms with armies of little ones to address the scalability problem. Yet, careful analysis and evaluation of the characteristics of our data - using metrics - often leads to dramatic improvements in performance. Firstly, are current scalable systems scalable enough? We found that for large or deep ontologies (some as large as 500,000 classes) it is hard to say because benchmarks obscure the load-time costs for materialization. Therefore, to expose those costs, we have synthesized a set of more representative ontologies. Secondly, in designing for scalability, how do we manage knowledge over time? By optimizing for data distribution and ontology evolution, we have reduced the population time, including materialization, for the NCBO Resource Index, a knowledge base of 16.4 billion annotations linking 2.4 million terms from 200 ontologies to 3.5 million data elements, from one week to less than one hour for one of the large datasets on the same machine.
机译:由于知识库进入较大的本体的景观并具有与相关数据的TB,我们必须致力于优化工具的性能。我们很容易诱惑购买更大的机器或填充有小家伙的机器,以解决可扩展性问题。然而,仔细分析和评估我们的数据的特征 - 使用指标 - 通常会导致性能的戏剧性改善。首先,目前可扩展系统是否足够可扩展?我们发现,对于大型或深层本体(有些大约500,000级),很难说,因为基准掩盖了实现的负载时间成本。因此,为了公开这些成本,我们已经合成了一套更多代表性的本体。其次,在设计可扩展性时,我们如何随时间管理知识?通过优化数据分布和本体论演变,我们减少了人口时间,包括资源指数,包括资源指数的资源指数,将有164亿元的知识库,从200个本体到350万个数据元素,从一周到更少同一台机器上的一个大型数据集中的一个小时。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号