【24h】

Ensemble-Based Relationship Discovery in Relational Databases

机译:基于组合的关系数据库关系发现

获取原文

摘要

We performed an investigation of how several data relationship discovery algorithms can be combined to improve performance. We investigated eight relationship discovery algorithms like Cosine similarity, Soundex similarity, Name similarity, Value range similarity, etc., to identify potential links between database tables in different ways using different categories of database information. We proposed voting system and hierarchical clustering ensemble methods to reduce the generalization error of each algorithm. Voting scheme uses a given weighting metric to combine the predictions of each algorithm. Hierarchical clustering groups predictions into clusters based on similarities and then combine a member from each cluster together. We run experiments to validate the performance of each algorithm and compare performance with our ensemble methods and the state-of-the-art algorithms (FaskFK, Randomness and HoPF) using Precision, Recall and F-Measure evaluation metrics over TPCH and AdvWork datasets. Results show that performance of each algorithm is limited, indicating the importance of combining them to consolidate their strengths.
机译:我们对如何将几个数据关系发现算法组合以改善性能进行调查。我们调查了八个关系发现算法,如余弦相似度,Soundex相似性,名称相似度,值范围相似度等,以使用不同类别的数据库信息以不同的方式识别数据库表之间的潜在链接。我们提出了投票系统和分层群集集合方法,以减少每种算法的泛化误差。投票方案使用给定的加权度量来组合每种算法的预测。分层群集组基于相似性进入群集,然后将来自每个集群的成员组合在一起。我们运行实验以验证每种算法的性能,并使用Precion,Recall和F-Measure评估指标与我们的集合方法和最先进的算法(FASKFK,随机性和HOPF)进行比较,并使用TPCH和Advwork数据集进行比较。结果表明,每种算法的性能都是有限的,表明将它们组合巩固其优势的重要性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号