首页> 外文会议>International Workshop on Knowledge Discovery in Inductive Databases >Models and Indices for Integrating Unstructured Data with a Relational Database
【24h】

Models and Indices for Integrating Unstructured Data with a Relational Database

机译:用于将非结构化数据与关系数据库集成的模型和指标

获取原文

摘要

Database systems are islands of structure in a sea of unstructured data sources. Several real-world applications now need to create bridges for smooth integration of semi-structured sources with existing structured databases for seamless querying. This integration requires extracting structured column values from the unstructured source and mapping them to known database entities. Existing methods of data integration do not effectively exploit the wealth of information available in multi-relational entities. We present statistical models for co-reference resolution and information extraction in a database setting. We then go over the performance challenges of training and applying these models efficiently over very large databases. This requires us to break open a black box statistical model and extract predicates over indexable attributes of the database. We show how to extract such predicates for several classification models, including naive Bayes classifiers and support vector machines. We extend these indexing methods for supporting similarity predicates needed during data integration.
机译:数据库系统是非结构化数据源海洋中结构的岛屿。几个现实世界应用程序现在需要创建桥梁,以便使用现有结构化数据库进行半结构化源的顺利集成,以便无缝查询。该集成需要从非结构化源中提取结构化列值并将其映射到已知的数据库实体。现有的数据集成方法不会有效利用多关联实体中可用的丰富信息。我们在数据库设置中提出了共同参考分辨率和信息提取的统计模型。然后,我们越过培训的性能挑战,并在非常大的数据库上有效地应用这些模型。这要求我们打破一个黑匣子统计模型,并提取数据库可索引属性的谓词。我们展示了如何提取若干分类模型的谓词,包括天真贝叶斯分类器和支持向量机。我们扩展了这些索引方法,以支持数据集成期间所需的相似性谓词。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号