Database enrichment environment to identify duplicate tuples

机译：数据库丰富环境识别重复的元组

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

One of the significant problems and inherent to current large databases is the incidence of duplicate tuples. This problem refers to the repetition of records that, in most cases, are represented differently in databases but refer to the same real world entity, which makes the task of identifying those tuples a hard work. Considering that each language has its peculiarities, it is believed that the use of text operations techniques from the area of Information Retrieval can enrich the content of the records for a specific language and thus maximize the amount of identified duplicate tuples and/or improve the confidence level of their classification in relation to current tools. The main contribution of this paper is to provide a language independent environment able to approximate the spelling of the records in a database and thus identify duplicate tuples more efficiently than the isolated application of traditional methods. In addition to only improve database quality this tool can also improve the process of Knowledge Discovery in Databases (KDD).

机译：当前大型数据库的重要问题之一是重复元组的发生率。这个问题是指重复记录，即在大多数情况下，在数据库中不同地表示，但参考相同的现实实体，这使得识别努力工作的任务。考虑到每种语言具有其特点，据信从信息检索区域使用文本操作技术可以丰富特定语言的记录的内容，从而最大限度地提高识别的重复元组和/或提高信心与当前工具相关的分类水平。本文的主要贡献是提供一种能够近似数据库中记录拼写的语言独立环境，从而比传统方法的孤立应用更有效地识别重复组成元组。除了仅提高数据库质量外，该工具还可以改善数据库（KDD）中的知识发现过程。

著录项

来源
《BCS-IRSG Symposium on Future Directions in Information Access》|2011年||共2页
会议地点
作者
Juliano Augusto Carreira; Carlos Roberto Valencio; Rogeria C. Gratao de Souza;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 G20-53;
关键词
Data Cleansing; Information Retrieval; Duplicate Tuples; Knowledge Discovery in Databases;

机译：数据清理;信息检索;重复元组;数据库中的知识发现;

相似文献

外文文献
中文文献
专利

1. Confiscation of Duplicate Tuples in The Relational Databases [J] . Dr.K.VenkataRamana, Dr.G.V.Ramesh Babu International Journal of Engineering Research and Applications . 2016,第3期

机译：在关系数据库中没收重复的元组
2. Enriching an intraspecific genetic map and identifying QTL for fiber quality and yield component traits across multiple environments in Upland cotton (Gossypium hirsutum L.) [J] . Liu Xueying, Teng Zhonghua, Wang Jinxia, Molecular genetics and genomics: MGG . 2017,第6期

机译：富含髂内遗传图谱并鉴定QTL纤维质质量，并在高地棉花（Gossypium hirsutum L）中的多种环境中的产量组分性状
3. Discovering probabilistic frequent closed itemsets in uncertain database with tuple uncertainty [J] . Chen Fengjuan, Qu Wenyu, Nie Lihai, International Journal of Computer Systems Science & Engineering . 2016,第2期

机译：在元组不确定的不确定数据库中发现概率频繁关闭项集
4. Database enrichment environment to identify duplicate tuples [C] . Juliano Augusto Carreira, Carlos Roberto Valencio, Rogeria C. Gratao de Souza BCS-IRSG Symposium on Future Directions in Information Access . 2011

机译：数据库富集环境识别重复的元组
5. Solving the data duplication problem for complex databases using neural networks. [D] . Al-Namlah, Abdullah Abdulrahman. 2003

机译：使用神经网络解决复杂数据库的数据重复问题。
6. iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition [O] . Hao Lin, En-Ze Deng, Hui Ding, 2014

机译：iPro54-PseKNC：基于序列的预测子用于鉴定具有假k元组核苷酸组成的原核生物中的sigma-54启动子
7. Database enrichment environment to identify duplicate tuples [O] . Juliano Augusto Carreira, Carlos Roberto Valêncio, Rogéria C. Gratão de Souza 2011

机译：数据库富集环境识别重复的元组
8. Identifying Nutrient Reference Sites in Nutrient-Enriched Regions: Using Algal, Invertebrate, and Fish-Community Measures to Identify Stressor-Breakpoint Thresholds in Indiana Rivers and Streams, 2005-9. [R] . Caskey, B. J., Bunch, A. R., Shoda, M. E., 2012

机译：确定营养丰富地区的营养参考地点：使用藻类，无脊椎动物和鱼类群落措施识别印第安纳河流和溪流中的压力源 - 断点阈值，2005-9。

Database enrichment environment to identify duplicate tuples

摘要

著录项

相似文献

相关主题

期刊订阅