Scaling up the ALIAS Duplicate Elimination System: A Demonstration

机译：缩放别名重复消除系统：演示

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Duplicate elimination is an important stage in integrating data from multiple sources. The challenges involved are finding a robust deduplication junction that can identify when two records are duplicates and efficiently applying the function on very large lists of records. In ALIAS the task of designing a deduplication function is eased by learning the function from examples of duplicates and non-duplicates and by using active learning to spot such examples effectively [1]. Here we investigate the issues involved in efficiently applying the learnt deduplication system on large lists of records. We demonstrate the working of the ALIAS evaluation engine and highlight the optimizations it uses to significantly cut down the number of record pairs that need to be explicitly materialized.

机译：重复消除是集成来自多个来源的数据的重要阶段。所涉及的挑战正在找到一个强大的重复数据删除交界处，可以识别两个记录何时重复并有效地应用于非常大的记录列表中的功能。在别名中，通过从重复和非重复的示例学习功能来缓解设计重复数据删除功能的任务，并且通过有效地使用主动学习来发现这些示例[1]。在这里，我们调查有效地应用于大型记录列表中学到的重复数据删除系统所涉及的问题。我们展示了别名评估引擎的工作，并突出显示它用于显着减少需要明确实现的记录对数的优化。

著录项

来源
《International Conference on Data Engineering》|2003年||共3页
会议地点
作者
Sunita Sarawagi; Alok Kirpal;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP274-53;
关键词

相似文献

外文文献
中文文献
专利

1. A Length-variable Feature Code Based Fuzzy Duplicates Elimination Approach for Large Scale Chinese WebPages [J] . Hongzhi Guo, Qingcai Chen, Cong Xin, Journal of software . 2012,第11期

机译：基于长度可变特征码的大规模中文网页模糊重复消除方法
2. A Length-variable Feature Code Based Fuzzy Duplicates Elimination Approach for Large Scale Chinese WebPages [J] . Hongzhi Guo1 2, Qingcai Chen1 2, Cong Xin1, Journal of software . 2012,第11期

机译：基于长度可变特征码的大规模中文网页模糊重复消除方法
3. Toward Elimination of Dog-Mediated Human Rabies: Experiences from Implementing a Large-scale Demonstration Project in Southern Tanzania [J] . Mpolya Emmanuel Abraham, Lembo Tiziana, Lushasi Kennedy, Frontiers in Veterinary Science . 2017,第7期

机译：致力于消除以狗为媒介的人类狂犬病：坦桑尼亚南部实施大规模示范项目的经验
4. Scaling up the ALIAS Duplicate Elimination System: A Demonstration [C] . Sunita Sarawagi, Alok Kirpal International Conference on Data Engineering . 2003

机译：缩放别名重复消除系统：演示
5. Field-scale demonstration of a permeable reactive barrier treatment system: Nitrate as a terminal electron acceptor. [D] . Kelly, Steve J. 2002

机译：渗透性反应性势垒处理系统的现场规模演示：硝酸盐作为末端电子受体。
6. Toward Elimination of Dog-Mediated Human Rabies: Experiences from Implementing a Large-scale Demonstration Project in Southern Tanzania [O] . Emmanuel Abraham Mpolya, Tiziana Lembo, Kennedy Lushasi, 2017

机译：努力消除以狗为媒介的人类狂犬病：坦桑尼亚南部实施大规模示范项目的经验
7. Scaling up the ALIAS Duplicate Elimination System: A Demonstration [O] . 2009

机译：扩大ALIAS复制消除系统：一个演示
8. Eliminating Cross-Server Operations in Scalable File Systems [R] . Hendricks, J., Sinnamohideen, S., Sambasivan, R. R., 2006

机译：消除可伸缩文件系统中的跨服务器操作

Scaling up the ALIAS Duplicate Elimination System: A Demonstration

摘要

著录项

相似文献

相关主题

期刊订阅