首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >Knowledge accumulation and resolution of data inconsistencies during the integration of microbial information sources
【24h】

Knowledge accumulation and resolution of data inconsistencies during the integration of microbial information sources

机译:整合微生物信息源期间的知识积累和数据不一致的解决

获取原文
获取原文并翻译 | 示例

摘要

The Internet has emerged as an ever-increasing environment of multiple heterogeneous and autonomous data sources that contain relevant but overlapping information on microorganisms. Microbiologists might therefore seriously benefit from the design of intelligent software agents that assist in the navigation through this information-rich environment, together with the development of data mining tools that can aid in the discovery of new information. These applications heavily depend upon well-conditioned data samples that are correlated with multiple information sources, hence, accurate database merging operations are desirable. Information systems designed for joining the related knowledge provided by different microbial data sources are hampered by the labeling mechanism for referencing microbial strains and cultures that suffers from syntactical variation in the practical usage of the labels, whereas, additionally, synonymy and homonymy are also known to exist amongst the labels. This situation is even complicated by the observation that the label equivalence knowledge is itself fragmentarily recorded over several data sources which can be suspected of providing information that might be both incomplete and incorrect. This paper presents how extraction and integration of label equivalence information from several distributed data sources has led to the construction of a so-called integrated strain database, which helps to resolve most of the above problems. Given the fact that information retrieved from autonomous resources might be overlapping, incomplete, and incorrect, much energy was spent into the completion of missing information, the discovery of new associations between information objects, and the development and application of tools for error detection and correction. Through a thorough evaluation of the different levels of incompleteness and incorrectness encountered within the incorporated data sources, we have finally given proof of the added value of the integrated strain database as a necessary service provider for the seamless integration of microbial information sources.
机译:互联网已经成为一个不断增长的环境,其中包含了多种异构且自治的数据源,这些数据源包含有关微生物的相关但重叠的信息。因此,微生物学家可能会从智能软件代理的设计中受益匪浅,这些代理可帮助您在信息丰富的环境中进行导航,同时还会开发可帮助发现新信息的数据挖掘工具。这些应用程序严重依赖与多个信息源相关联的条件良好的数据样本,因此,需要准确的数据库合并操作。旨在结合不同微生物数据源提供的相关知识而设计的信息系统受到标记机制的阻碍,该机制用于参考在标记的实际用法中存在句法差异的微生物菌株和培养物,此外,同义词和同名异物也是众所周知的标签之间存在。观察到标签等效性知识本身是零碎记录在几个数据源上的,这使情况变得更加复杂,这些数据源可能被怀疑提供了不完整和不正确的信息。本文介绍了如何从多个分布式数据源中提取和整合标签等效信息,从而如何构建所谓的整合应变数据库,从而有助于解决上述大多数问题。考虑到从自治资源中检索到的信息可能重叠,不完整和不正确的事实,因此需要花费大量精力来完成丢失的信息,发现信息对象之间的新关联以及开发和应用错误检测和纠正工具。通过对合并的数据源中遇到的不同程度的不完整和不正确性的透彻评估,我们最终证明了集成应变数据库的附加值是微生物信息源无缝集成的必要服务提供商。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号