Scalable Column Concept Determination for Web Tables Using Large Knowledge Bases

机译：使用大知识库的可扩展列概念确定Web表

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Tabular data on the Web has become a rich source of structured data that is useful for ordinary users to explore. Due to its potential, tables on the Web have recently attracted a number of studies with the goals of understanding the semantics of those Web tables and providing effective search and exploration mechanisms over them. An important part of table understanding and search is column concept determination, i.e., identifying the most appropriate concepts associated with the columns of the tables. The problem becomes especially challenging with the availability of increasingly rich knowledge bases that contain hundreds of millions of entities. In this paper, we focus on an important instantiation of the column concept determination problem, namely, the concepts of a column are determined by fuzzy matching its cell values to the entities within a large knowledge base. We provide an efficient and scalable MapReduce-based solution that is scalable to both the number of tables and the size of the knowledge base and propose two novel techniques: knowledge concept aggregation and knowledge entity partition. We prove that both the problem of finding the optimal aggregation strategy and that of finding the optimal partition strategy are NP-hard, and propose efficient heuristic techniques by leveraging the hierarchy of the knowledge base. Experimental results on real-world datasets show that our method achieves high annotation quality and performance, and scales well.

机译：Web上的表格数据已成为具有普通用户探索的有用的结构化数据的丰富源。由于其潜力，网络上的表最近吸引了许多研究，以了解这些网络表的语义，并为它们提供有效的搜索和探索机制。表格和搜索的重要部分是列概念确定，即，识别与表的列关联的最合适的概念。问题变得尤为挑战，越来越丰富的知识库，含有数亿个实体的知识库。在本文中，我们专注于列概念确定问题的重要实例，即，列的概念由模糊匹配其小区值与大知识库内的实体确定。我们提供了一种高效且可扩展的MapReduce的解决方案，可扩展到知识库的表数和大小，并提出了两种新颖的技术：知识概念聚合和知识实体分区。我们证明了找到最佳聚合策略的问题以及找到最佳分区策略的问题是NP-Hard，并通过利用知识库的等级来提出高效的启发式技术。实验结果对现实世界数据集表明，我们的方法达到了高注释质量和性能，并衡量良好。

著录项

来源
《International conference on very large data bases》|2013年||共12页
会议地点
作者
Dong Deng; Yu Jiang; Guoliang Li; Jian Li; Cong Yu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.13;
关键词

相似文献

外文文献
中文文献
专利

1. Knowledge extraction using semantic similarity of concepts from Web of Things knowledge bases [J] . Muppavarapu Vamsee, Ramesh Gowtham, Gyrard Amelie, Data & Knowledge Engineering . 2021,第Sepa期

机译：知识提取利用事物WEB的语义相似性知识库
2. ThermoData Engine (TDE): Software implementation of the dynamic data evaluation concept. 6. Dynamic web-based data dissemination through the NIST web thermo tables [J] . Kroenlein K., Muzny C.D., Diky V., Journal of chemical information and modeling . 2011,第6期

机译：ThermoData Engine（TDE）：动态数据评估概念的软件实现。 6.通过NIST网络温度表进行基于网络的动态数据传播
3. An Intelligent Web-Fusion using Mash up Applications Based web-scale probabilistic knowledge for Complex Operations [J] . Advances in Natural and Applied Sciences . 2015,第6期

机译：使用基于混搭应用程序的Web规模概率知识进行复杂操作的智能Web融合
4. Scalable Column Concept Determination for Web Tables Using Large Knowledge Bases [C] . Dong Deng, Yu Jiang, Guoliang Li, International conference on very large data bases . 2013

机译：使用大型知识库的Web表可扩展列概念确定
5. Web-scale knowledge-base construction via statistical inference and learning. [D] . Niu, Feng. 2012

机译：通过统计推断和学习构建Web级知识库。
6. Changing Sodium Knowledge Attitudes and Intended Behaviours Using Web-Based Dietary Assessment Tools: A Proof-Of-Concept Study [O] . Katherine Jefferson, Zhila Semnani-Azad, Christina Wong, 2019

机译：使用基于Web的饮食评估工具改变钠的知识态度和预期行为：概念验证研究
7. Scalable column concept determination for web tables using large knowledge bases [O] . Dong Deng, Yu Jiang, Guoliang Li, 2015

机译：使用大型知识库对Web表进行可扩展的列概念确定

Scalable Column Concept Determination for Web Tables Using Large Knowledge Bases

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅