【24h】

The Flamingo Software Package on Approximate String Queries

机译:近似字符串查询中的Flamingo软件包

获取原文

摘要

An important operation in data cleaning is similarity search on textual strings. A simple example is 'finding actor names similar to schwarzeneger,' given the fact that few people know the exact spelling of our former governor in California. It is challenging to support this operation efficiently on large amounts of data. Despite its importance, the problem did not receive enough attention in the research community a decade ago. In this talk, I will give an overview of recent results on this problem, and describe the development history of the Flamingo package, an open-source software that supports efficient approximate string queries. I will also describe my outreach activities to apply our research results of data cleaning in real applications, which led to a startup called Bimaple that specializes in powerful instant search on large data sets.
机译:数据清理中的一项重要操作是对文本字符串进行相似性搜索。一个简单的例子是“找到与施瓦辛格相似的演员姓名”,因为很少有人知道我们前加利福尼亚州州长的确切拼写。在大量数据上有效地支持此操作具有挑战性。尽管它很重要,但是十年前这个问题在研究界并未得到足够的重视。在本次演讲中,我将概述有关该问题的最新结果,并描述Flamingo软件包的开发历史,该软件包是一种支持高效近似字符串查询的开源软件。我还将描述我的推广活动,以将我们的数据清理研究结果应用到实际应用中,这导致了一家名为Bimaple的初创公司专注于对大型数据集进行强大的即时搜索。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号