首页> 外文学位 >Leveraging external user-generated information for large-scale data integration.

【24h】

Leveraging external user-generated information for large-scale data integration.

机译：利用外部用户生成的信息进行大规模数据集成。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The proliferation of data sources both in the private and public domains (e.g., in enterprise environments and on the World-Wide Web) underscores the need for data integration systems. The purpose of a data integration system is to enable users to access data residing in multiple heterogenous sources through a uniform interface. Manual solutions for building such systems are not a viable option, especially when dealing with large-scale and complex applications.;This dissertation studies the automation of building data integration systems. In particular, it addresses three key challenges that lie at the heart of any such system.;The first challenge relates to the construction of wrappers for the unstructured sources. A source wrapper would ensure that the data in the underlying source is perceived as structured data by the other parts of the system. We particularly focus on sources containing data formatted as lists, and propose a new solution for extracting relational tables from them. The proposed solution is completely unsupervised and domain-independent. It is based on leveraging various sources of information, including a corpus of tens of millions of relational tables published by users on the Web.;The second and third challenges are concerned with establishing semantic mappings across data sources. We first propose a new solution for discovering the correspondences across the elements of two schemas. Then, based on these simple correspondences, we propose another solution to discover more complex declarative mapping rules that can actually be used to transform data and queries across the two schemas. The key underpinning for these two solutions is that, unlike previous approaches, they both exploit the usage information extracted from database query logs. This work is the first to introduce the usage-based approach for establishing mappings across data sources.;To evaluate our approaches, we conducted experiments using realistic data sets, such as real web lists for the wrapper construction work; and schemas and query logs from the retail and life sciences domains for the work on semantic mappings. The experimental results have verified the effectiveness and applicability of our proposed approaches.

机译：私有域和公共域中（例如，在企业环境中和在万维网上）数据源的激增强调了对数据集成系统的需求。数据集成系统的目的是使用户能够通过统一接口访问驻留在多个异构源中的数据。建立这样的系统的手动解决方案不是一个可行的选择，尤其是在处理大规模和复杂的应用程序时。；本论文研究了建立数据集成系统的自动化。尤其是，它解决了任何此类系统核心的三个关键挑战。第一个挑战涉及为非结构化源构建包装器。源包装器将确保基础源中的数据被系统的其他部分视为结构化数据。我们特别关注包含格式化为列表的数据的源，并提出一种从中提取关系表的新解决方案。所提出的解决方案是完全不受监督且与域无关的。它基于利用各种信息源的信息，包括用户在Web上发布的数千万个关系表的语料库。第二个和第三个挑战涉及跨数据源建立语义映射。我们首先提出一种新的解决方案，用于发现两个模式的元素之间的对应关系。然后，基于这些简单的对应关系，我们提出了另一种解决方案，以发现更复杂的声明性映射规则，这些规则实际上可用于在两种模式之间转换数据和查询。这两种解决方案的关键基础是，与以前的方法不同，它们都利用从数据库查询日志中提取的使用信息。这项工作是第一个引入基于用法的方法来建立跨数据源的映射。为了评估我们的方法，我们使用了真实的数据集进行了实验，例如包装器构造工作的真实Web列表；以及零售和生命科学领域的模式和查询日志，以进行语义映射。实验结果证明了我们提出的方法的有效性和适用性。

著录项

作者
Elmeleegy, Hazem.;
展开▼
作者单位

Purdue University.;

展开▼
授予单位 Purdue University.;
学科 Computer Science.
学位 Ph.D.
年度 2010
页码 154 p.
总页数 154
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Using social discovery systems to leverage user-generated metadata [J] . Louise F. Spiteri Director Bulletin of the American Society for Information Science and Technology . 2011,第4期

机译：使用社交发现系统来利用用户生成的元数据
2. Using Social Discovery Systems to Leverage User-Generated Metadata. [J] . Spiteri Louise F.1. Bulletin of the American Society for Information Science & Technology . 2011,第4期

机译：使用社交发现系统来利用用户生成的元数据。
3. Cyclin Pathway Genomic Alterations Across 190,247 Solid Tumors: Leveraging Large-Scale Data to Inform Therapeutic Directions [J] . Denis L. Jardim, Sherri Z. Millis, Jeffrey S. Ross, The oncologist . 2021,第1期

机译：细胞周期蛋白途径基因组改变跨190,247实体瘤：利用大规模数据通知治疗方向
4. Learning Comment Generation by Leveraging User-generated Data [C] . Zhaojiang Lin, Genta Indra Winata, Pascale Fung IEEE International Conference on Acoustics, Speech and Signal Processing . 2019

机译：通过利用用户生成的数据来学习评论生成
5. Learning and Leveraging Structured Knowledge from User-Generated Social Media Data [D] . Dong, Hang. 2020

机译：从用户生成的社交媒体数据学习和利用结构化知识
6. Beyond duty hours: leveraging large-scale paging data to monitor resident workload [O] . Amit Kaushal, Laurence Katznelson, Robert A. Harrington 2019

机译：超出工作时间：利用大规模的寻呼数据来监视居民的工作量
7. Using social discovery systems to leverage user-generated metadata [O] . Louise F. Spiteri 2011

机译：使用社交发现系统利用用户生成的元数据

Leveraging external user-generated information for large-scale data integration.

摘要

著录项

相似文献

相关主题

期刊订阅