首页> 外文会议>BCS-IRSG Symposium on Future Directions in Information Access >Database enrichment environment to identify duplicate tuples
【24h】

Database enrichment environment to identify duplicate tuples

机译:数据库丰富环境识别重复的元组

获取原文

摘要

One of the significant problems and inherent to current large databases is the incidence of duplicate tuples. This problem refers to the repetition of records that, in most cases, are represented differently in databases but refer to the same real world entity, which makes the task of identifying those tuples a hard work. Considering that each language has its peculiarities, it is believed that the use of text operations techniques from the area of Information Retrieval can enrich the content of the records for a specific language and thus maximize the amount of identified duplicate tuples and/or improve the confidence level of their classification in relation to current tools. The main contribution of this paper is to provide a language independent environment able to approximate the spelling of the records in a database and thus identify duplicate tuples more efficiently than the isolated application of traditional methods. In addition to only improve database quality this tool can also improve the process of Knowledge Discovery in Databases (KDD).
机译:当前大型数据库的重要问题之一是重复元组的发生率。这个问题是指重复记录,即在大多数情况下,在数据库中不同地表示,但参考相同的现实实体,这使得识别努力工作的任务。考虑到每种语言具有其特点,据信从信息检索区域使用文本操作技术可以丰富特定语言的记录的内容,从而最大限度地提高识别的重复元组和/或提高信心与当前工具相关的分类水平。本文的主要贡献是提供一种能够近似数据库中记录拼写的语言独立环境,从而比传统方法的孤立应用更有效地识别重复组成元组。除了仅提高数据库质量外,该工具还可以改善数据库(KDD)中的知识发现过程。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号