Knowledge accumulation and resolution of data inconsistencies during the integration of microbial information sources

Dawyndt P.; Vancanneyt M.; De Meyer H.; Swings J.

首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >Knowledge accumulation and resolution of data inconsistencies during the integration of microbial information sources

【24h】

Knowledge accumulation and resolution of data inconsistencies during the integration of microbial information sources

机译：整合微生物信息源期间的知识积累和数据不一致的解决

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The Internet has emerged as an ever-increasing environment of multiple heterogeneous and autonomous data sources that contain relevant but overlapping information on microorganisms. Microbiologists might therefore seriously benefit from the design of intelligent software agents that assist in the navigation through this information-rich environment, together with the development of data mining tools that can aid in the discovery of new information. These applications heavily depend upon well-conditioned data samples that are correlated with multiple information sources, hence, accurate database merging operations are desirable. Information systems designed for joining the related knowledge provided by different microbial data sources are hampered by the labeling mechanism for referencing microbial strains and cultures that suffers from syntactical variation in the practical usage of the labels, whereas, additionally, synonymy and homonymy are also known to exist amongst the labels. This situation is even complicated by the observation that the label equivalence knowledge is itself fragmentarily recorded over several data sources which can be suspected of providing information that might be both incomplete and incorrect. This paper presents how extraction and integration of label equivalence information from several distributed data sources has led to the construction of a so-called integrated strain database, which helps to resolve most of the above problems. Given the fact that information retrieved from autonomous resources might be overlapping, incomplete, and incorrect, much energy was spent into the completion of missing information, the discovery of new associations between information objects, and the development and application of tools for error detection and correction. Through a thorough evaluation of the different levels of incompleteness and incorrectness encountered within the incorporated data sources, we have finally given proof of the added value of the integrated strain database as a necessary service provider for the seamless integration of microbial information sources.

机译：互联网已经成为一个不断增长的环境，其中包含了多种异构且自治的数据源，这些数据源包含有关微生物的相关但重叠的信息。因此，微生物学家可能会从智能软件代理的设计中受益匪浅，这些代理可帮助您在信息丰富的环境中进行导航，同时还会开发可帮助发现新信息的数据挖掘工具。这些应用程序严重依赖与多个信息源相关联的条件良好的数据样本，因此，需要准确的数据库合并操作。旨在结合不同微生物数据源提供的相关知识而设计的信息系统受到标记机制的阻碍，该机制用于参考在标记的实际用法中存在句法差异的微生物菌株和培养物，此外，同义词和同名异物也是众所周知的标签之间存在。观察到标签等效性知识本身是零碎记录在几个数据源上的，这使情况变得更加复杂，这些数据源可能被怀疑提供了不完整和不正确的信息。本文介绍了如何从多个分布式数据源中提取和整合标签等效信息，从而如何构建所谓的整合应变数据库，从而有助于解决上述大多数问题。考虑到从自治资源中检索到的信息可能重叠，不完整和不正确的事实，因此需要花费大量精力来完成丢失的信息，发现信息对象之间的新关联以及开发和应用错误检测和纠正工具。通过对合并的数据源中遇到的不同程度的不完整和不正确性的透彻评估，我们最终证明了集成应变数据库的附加值是微生物信息源无缝集成的必要服务提供商。

著录项

来源
《IEEE Transactions on Knowledge and Data Engineering》 |2005年第8期|p.1111-1126|共16页
作者
Dawyndt P.; Vancanneyt M.; De Meyer H.; Swings J.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Internet; biology computing; data mining; distributed databases; error correction; error detection; information retrieval; microorganisms; software agents; Internet; data inconsistency; data mining tools; database merging operation; distributed data sources; error cor;

机译：互联网;生物计算;数据挖掘;分布式数据库;纠错;错误检测;信息检索;微生物;软件代理;互联网;数据不一致;数据挖掘工具;数据库合并操作;分布式数据源;错误修正;

相似文献

外文文献
中文文献
专利

1. Fusionplex: resolution of data inconsistencies in the integration of heterogeneous information sources [J] . Amihai Motro, Philipp Anokhin Information Fusion . 2006,第2期

机译：Fusionplex：解决异构信息源集成中的数据不一致问题
2. Extracting consistent knowledge from highly inconsistent cancer gene data sources [J] . Xue Gong, Ruihong Wu, Yuannv Zhang, BMC Bioinformatics . 2010,第1期

机译：从高度不一致的癌症基因数据源中提取一致的知识
3. Acquiring knowledge from inconsistent data sources through weighting [J] . Shichao Zhang, Qingfeng Chen, Qiang Yang Data & Knowledge Engineering . 2010,第8期

机译：通过加权从不一致的数据源中获取知识
4. Use of Meta-data for Value-level Inconsistency Detection and Resolution During Data Integration [C] . Philipp Anokhin, Amihai Motro World Multiconference on Systemics, Cybernetics and Informatics(SCI 2001) v.14: Computer Science and Engineering pt.2; 20010722-20010725; Orlando,FL; US . 2001

机译：在数据集成过程中使用元数据进行价值级别的不一致检测和解决
5. Data inconsistency detection and resolution in the integration of heterogeneous information sources. [D] . Anokhin, Philipp. 2001

机译：异构信息源集成中的数据不一致检测和解决。
6. Extracting consistent knowledge from highly inconsistent cancer gene data sources [O] . Xue Gong, Ruihong Wu, Yuannv Zhang, 2010

机译：从高度不一致的癌症基因数据源中提取一致的知识
7. Extracting consistent knowledge from highly inconsistent cancer gene data sources [O] . Wang Jing, Zhang Lin, Gu Yunyan, 2010

机译：从高度不一致的癌症基因数据源中提取一致的知识
8. Context Interchange: Using Knowledge About Data to Integrate Disparate Sources [R] . Madnick, S. , Siegel, M. 1999

机译：上下文交换：使用关于数据的知识来集成不同的源

Knowledge accumulation and resolution of data inconsistencies during the integration of microbial information sources

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅