Making holistic schema matching robust

机译：使整体模式匹配更健壮

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The Web has been rapidly "deepened" by myriad searchable databases online, where data are hidden behind query interfaces. As an essential task toward integrating these massive "deep Web" sources, large scale schema matching (i.e., discovering semantic correspondences of attributes across many query interfaces) has been actively studied recently. In particular, many works have emerged to address this problem by "holistically" matching many schemas at the same time and thus pursuing "mining" approaches in nature. However, while holistic schema matching has built its promise upon the large quantity of input schemas, it also suffers the robustness problem caused by noisy data quality. Such noises often inevitably arise in the automatic extraction of schema data, which is mandatory in large scale integration. For holistic matching to be viable, it is thus essential to make it robust against noisy schemas. To tackle this challenge, we propose a data-ensemble framework with samplingand voting techniques, which is inspired by bagging predictors. Specifically, our approach creates an ensemble of matchers, by randomizing input schema data into many independently downsampled trials, executing the same matcher on each trial and then aggregating their ranked results by taking majority voting. As a principled basis, we provide analytic justification of the effectiveness of this data-ensemble framework. Further, empirically, our experiments on real Web data show that the "ensemblization" indeed significantly boosts the matching accuracy under noisy schema input, and thus maintains the desired robustness of a holistic matcher.

机译：无数可搜索的在线数据库使Web迅速“加深”，其中数据隐藏在查询界面的后面。最近，作为整合这些庞大的“深度Web”资源的一项重要任务，大规模模式匹配（即发现许多查询接口之间的属性的语义对应关系）。尤其是，出现了许多通过同时“整体”匹配多个模式并因此在自然界中追求“挖掘”方法来解决此问题的工作。但是，尽管整体模式匹配在大量输入模式上建立了自己的诺言，但它也遭受了噪声数据质量引起的鲁棒性问题。这种噪声通常不可避免地出现在模式数据的自动提取中，这在大规模集成中是必不可少的。为了使整体匹配可行，因此必须使其对于嘈杂的模式具有鲁棒性。为了应对这一挑战，我们提出了一个采用采样和投票技术的数据集成框架，该框架的灵感来自装袋预测变量。具体来说，我们的方法通过将输入模式数据随机分为许多独立的降采样的试验，在每个试验中执行相同的匹配器，然后通过进行多数表决来汇总其排名结果，从而创建一个匹配器集合。作为原则基础，我们提供此数据集成框架有效性的分析依据。此外，凭经验，我们在真实Web数据上的实验表明，“整合”确实可以显着提高在嘈杂模式输入下的匹配精度，从而保持整体匹配器的理想鲁棒性。 展开▼

著录项

来源
《ACM SIGKDD international conference on Knowledge discovery in data mining》|2005年|P.429-438|共10页

会议地点

作者
Bin He; Kevin Chen-Chuan Chang; PBin He; PKevin Chen-Chuan Chang;
展开▼

作者单位

展开▼

会议组织

原文格式 PDF

正文语种

中图分类计算技术、计算机技术;

关键词
schema matching;

机译：模式匹配;

相似文献

外文文献

中文文献

专利

1. Integrating Correlation Clustering and Agglomerative Hierarchical Clustering for Holistic Schema Matching [J] . Basel Alshaikhdeeb, Kamsuriah Ahmad Journal of computer sciences . 2015,第3期

机译：集成关联聚类和聚集层次聚类进行整体模式匹配

2. Integrating Correlation Clustering and Agglomerative Hierarchical Clustering for Holistic Schema Matching | Science Publications [J] . Basel Alshaikhdeeb, Kamsuriah Ahmad Journal of computer sciences . 2015,第3期

机译：集成关联聚类和聚集层次聚类进行整体模式匹配科学出版物

3. AN INTEGRATED CLUSTERING METHOD FOR HOLISTIC SCHEMA MATCHING [J] . ADEL A. ALOFAIRI, KAMSURIAH AHMAD Journal of Theoretical and Applied Information Technology . 2014,第2期

机译：整体模式匹配的集成聚类方法

4. Making Holistic Schema Matching Robust: An Ensemble Approach [C] . Bin He, Kevin Chen-Chuan Chang Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD'05); 20050821-24; Chicago,IL(US) . 2005

机译：使整体模式匹配稳健：一种集成方法

5. A holistic paradigm for large scale schema matching. [D] . He, Bin. 2006

机译：大规模模式匹配的整体范例。

6. Outcomes of Kidney Transplantation with a CMV Matching Allocation Schema [O] . Lynne Strasfeld, Debargha Basuli, Douglas Norman, 2017

机译：具有CMV匹配分配方案的肾脏移植的结果

7. Making Holistic Schema Matching Robust: An Ensemble Framework with Sampling and Voting [O] . He Bin, Chang Kevin Chen-Chuan 2004

机译：使整体模式匹配稳健：具有抽样和投票的集合框架

1. 避免假重传使TCP更健壮的RR-CETEN算法 [J] . 王东 ,朱晓洁 ,吴克寿 . 重庆大学学报：自然科学版 . 2006,第9期

2. 怎样使西瓜长势更健壮 [J] . . 新农业 . 2005,第012期

3. 专题：卫生用品生产与研发新趋势——如何使一次性卫生用品更透气、更柔软、更贴身 [J] . 陈杰 . 生活用纸 . 2018,第10期

4. 赛默飞世尔：帮助客户使世界更健康、更清洁、更安全 [J] . 伊西科 . 商务周刊 . 2009,第015期

5. 博世技术使全球汽车更安全、更清洁、更经济——博世公司第58届国际汽车媒体新闻发布会纪实(一) [J] . 吴憩棠 . 汽车与配件 . 2007,第029期

6. 使衣物洗涤得更清洁并且更‘绿色’ [C] . 李清 . 第25届（2005）中国洗涤用品行业年会 . 2005

7. 心肌细胞SIRT1基因的缺失使心肌对缺血和再灌注损伤更敏感 [A] . 王琳 . 2020

1. 一种使光源亮度更均匀的LED灯板及LED灯板接线方法 [P] . 中国专利： CN114135807A . 2022-03-04

2. 增加弹性带体使边际更密合的平面口罩 [P] . 中国专利： CN215873544U . 2022-02-22

3. System and method for matching schemas to ontologies [P] . 外国专利： EP1808777B1 . 2014-03-12

机译：用于将模式与本体匹配的系统和方法

4. Method for matching elements in schemas of databases using a Bayesian network [P] . 外国专利： US8577857B2 . 2013-11-05

机译：使用贝叶斯网络匹配数据库模式中元素的方法

5. METHOD FOR MATCHING ELEMENTS IN SCHEMAS OF DATABASES USING BAYESIAN NETWORK [P] . 外国专利： WO2012133941A1 . 2012-10-04

机译：贝叶斯网络的数据库模式中元素匹配方法

相关主题

Making holistic schema matching robust

摘要

著录项

相似文献

相关主题

期刊订阅