首页> 外文会议>International conference on very large data bases >Transform-Data-by-Example (TDE): An Extensible Search Engine for Data Transformations
【24h】

Transform-Data-by-Example (TDE): An Extensible Search Engine for Data Transformations

机译:示例数据转换(TDE):用于数据转换的可扩展搜索引擎

获取原文
获取外文期刊封面目录资料

摘要

Today, business analysts and data scientists increasingly need to clean, standardize and transform diverse data sets, such as name, address, date time, and phone number, before they can perform analysis. This process of data transformation is an important part of data preparation, and is known to be difficult and time-consuming for end-users. Traditionally, developers have dealt with these longstanding transformation problems using custom code libraries. They have built vast varieties of custom logic for name parsing and address standardization, etc., and shared their source code in places like GitHub. Data transformation would be a lot easier for end-users if they can discover and reuse such existing transformation logic. We developed Transform-Data-by-Example (TDE), which works like a search engine for data transformations. TDE "indexes" vast varieties of transformation logic in source code, DLLs, web services and mapping tables, so that users only need to provide a few input/output examples to demonstrate a desired transformation, and TDE can interactively find relevant, functions to synthesize new programs consistent, with all examples. Using an index of 50K functions crawled from GitHub and Stackoverflow, TDE can already handle many common transformations not currently supported by existing systems. On a benchmark with over 200 transformation tasks. TDE generates correct transformations for 72% tasks, which is considerably better than other systems evaluated. A beta version of TDE for Microsoft, Excel is available via Office store1. Part of the TDE technology also ships in Microsoft Power BI.
机译:如今,业务分析师和数据科学家越来越需要清理,标准化和转换各种数据集,例如名称,地址,日期时间和电话号码,然后才能进行分析。数据转换的这一过程是数据准备的重要部分,并且已知对于最终用户而言既困难又耗时。传统上,开发人员使用自定义代码库处理这些长期存在的转换问题。他们为名称解析和地址标准化等构建了各种各样的自定义逻辑,并在GitHub之类的地方共享了源代码。如果最终用户可以发现并重用这种现有的转换逻辑,那么数据转换将使最终用户容易得多。我们开发了示例转换数据(TDE),它像数据转换的搜索引擎一样工作。 TDE在源代码,DLL,Web服务和映射表中为各种各样的转换逻辑“建立索引”,因此用户仅需提供一些输入/输出示例即可演示所需的转换,并且TDE可以交互地找到相关的功能以进行综合新程序与所有示例保持一致。使用从GitHub和Stackoverflow抓取的5万个函数的索引,TDE已经可以处理现有系统当前不支持的许多常见转换。在具有200多个转换任务的基准上。 TDE可以为72%的任务生成正确的转换,这比其他评估的系统要好得多。可通过Office store1获得Microsoft的TDE Beta版。 TDE技术的一部分也随Microsoft Power BI一起提供。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号