Human-Centric Debugging of Entity Matching.

机译：实体匹配的以人为中心的调试。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Entity matching (EM) is the problem of finding data records that refer to the same real-world entity. For example, the two records (Matthew Richardson, 206-453-1978) and (Matt W. Richardson, 453 1978) may refer to the same person. It is an important data integration problem with many applications such as in e-commerce, healthcare, and national security. Recent work on entity matching has focused on using machine learning and/or crowdsourcing in order to improve accuracy and/or scale the current matching solutions despite the fact that this task is typically done with a human analyst in the loop. Therefore, in this thesis we propose to work on solutions that acknowledge that humans are in the loop for completing an entity matching task. We focus on debugging of entity matching, which is an iterative process by which an analyst improves matching quality. Hence the title, "Human-Centric Debugging of Entity Matching''.;We build an end-to-end matching system and experiment with it in an e-commerce setting as well as with students in a graduate data modeling course at UW-Madison. We also develop an abstract model of the entity matching problem for an analyst to understand what makes an entity matching problem hard for an analyst. The insights learned in the above work lead to the following works in the rest of the thesis: First, we focus on debugging rule-based matchers and we attempt to make it an interactive process by which an analyst can quickly iterate and find a high quality matcher. We show that by optimally ordering the rules as well as incrementally running the matcher on top of previous matching output we can decrease runtime significantly. And second, we focus on debugging of entity matching data sets. We develop a framework to help an analyst quickly find and resolve inconsistencies in a data set. We experiment with seven real-world data sets and demonstrate the effectiveness of our framework in finding inconsistencies.

机译：实体匹配（EM）是查找引用同一真实世界实体的数据记录的问题。例如，两条记录（Matthew Richardson，206-453-1978）和（Matt W. Richardson，453 1978）可能是指同一个人。对于许多应用程序来说，这是一个重要的数据集成问题，例如在电子商务，医疗保健和国家安全中。尽管事实通常是在回路中由人工分析人员完成的，但有关实体匹配的最新工作已集中于使用机器学习和/或众包以提高准确性和/或扩展当前的匹配解决方案。因此，在本论文中，我们提议研究解决方案，这些解决方案承认人类处于完成实体匹配任务的循环中。我们专注于实体匹配的调试，这是一个迭代过程，分析师可以通过该过程提高匹配质量。因此，标题为“以人为中心的实体匹配调试”。；我们构建了端到端的匹配系统，并在电子商务环境中对其进行了实验，并在UW-的研究生数据建模课程中与学生进行了实验麦迪逊（Madison），我们还为分析师建立了实体匹配问题的抽象模型，以了解是什么使分析师难以解决实体匹配问题，在上述工作中获得的见解导致了本论文的其余部分：我们专注于调试基于规则的匹配器，并尝试使其成为一个交互式过程，分析人员可以通过该过程快速迭代并找到高质量的匹配器，这表明通过最佳排序规则以及在先前的基础上递增运行匹配器匹配输出，可以显着减少运行时间；其次，我们专注于实体匹配数据集的调试；我们开发了一个框架来帮助分析师快速找到并解决数据集中的不一致性；我们尝试了七个世界数据集，并证明我们的框架在发现不一致之处方面的有效性。

著录项

作者
Panahi, Fatemah.;
展开▼
作者单位

The University of Wisconsin - Madison.;

展开▼
授予单位 The University of Wisconsin - Madison.;
学科 Computer science.
学位 Ph.D.
年度 2017
页码 151 p.
总页数 151
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. An embedded debugging primer: Hugh O'Keeffe looks at common embedded debug strategies, the pros and cons of each and introduces the Atmel AVR32 on-chip debug system and associated debug tools developed by Ashling Microsystems [J] . Hugh OKeeffe Embedded Systems Europe . 2006,第73期

机译：嵌入式调试入门：Hugh O'Keeffe研究了常见的嵌入式调试策略，每种策略的优缺点，并介绍了Ashling Microsystems开发的Atmel AVR32片上调试系统和相关的调试工具。
2. Defeating Anti-Debugging Techniques for Malware Analysis Using a Debugger [J] . Jong-Wouk Kim, Jiwon Bang, Mi-Jung Choi Advances in Science, Technology and Engineering Systems . 2020,第6期

机译：使用调试器击败恶意软件分析的防调试技术
3. Swarm debugging: The collective intelligence on interactive debugging [J] . Petrillo Fabio, Gueheneuc Yann-Gael, Pimenta Marcelo, The Journal of Systems and Software . 2019,第JULa期

机译：群调试：交互式调试的集体智慧
4. SEED: A system for entity exploration and debugging in large-scale knowledge graphs [C] . Jun Chen, Yueguo Chen, Xiaoyong Du, IEEE International Conference on Data Engineering . 2016

机译：SEED：用于大规模知识图中的实体探索和调试的系统
5. Scalable human-centric entity matching. [D] . Das, Sanjib Kumar. 2017

机译：可扩展的以人为中心的实体匹配。
6. BigDebug: Debugging Primitives for Interactive Big Data Processing in Spark [O] . Muhammad Ali Gulzar, Matteo Interlandi, Seunghyun Yoo, -1

机译：BigDebug：用于Spark中交互式大数据处理的调试原语
7. Stealth debugging of programs in Qemu emulator with WinDbg debugger [O] . Abakumov M.A., Dovgalyuk P.M. 2018

机译：Stealth调试QEMU模拟器的程序与WindBG调试器
8. Entity Matching. (Entitetsmatchning). [R] . Torne, A. 2011

机译：实体匹配。（Entitetsmatchning）。

Human-Centric Debugging of Entity Matching.

摘要

著录项

相似文献

相关主题

期刊订阅