首页> 外文会议>International Conference on Current Trends in Theory andractice of Computer Science >Semi-automatic Column Type Inference for CSV Table Understanding
【24h】

Semi-automatic Column Type Inference for CSV Table Understanding

机译:用于CSV表的半自动列类型推断

获取原文

摘要

Spreadsheets are often used as a simple way for representing tabular data. However, since they do not impose any restriction on their table structures and contents, their automatic processing and the integration with other information sources are particularly hard problems to solve. Many table understanding approaches have been proposed for extracting data from tables and transforming them in meaningful information. However, they require some regularities on the table contents. Starting from CSV spreadsheets that present values of different types and errors, in this paper we introduce an approach for inferring the types of columns in CSV tables by exploiting a multi-label classification approach. By means of our approach, each column of the table can be associated with a simple datatype (such as integer, float, text), a domain-specific one (such as the name of a municipality, and address), or an "union" of types (that takes into account the frequency of the corresponding values). Since the automatically inferred types might not be accurate, graphical interfaces have been developed for supporting the user in fixing the mistakes. Experimental results are finally reported on real spreadsheets obtained by a debt collection agency.
机译:电子表格通常用作表示表格数据的简单方法。但是,由于它们不会对其表结构和内容施加任何限制,因此它们的自动处理和与其他信息来源的集成是特别难以解决的问题。已经提出了许多表的理解方法,用于从表中提取数据并以有意义的信息转换它们。但是,它们需要表内容的一些规则。从CSV电子表格开始,本文提出了一种方法,介绍了一种方法,可以通过利用多标签分类方法来推断CSV表中的列类型的方法。通过我们的方法,表的每列都可以与简单的数据类型(如整数,浮点,文本),特定于域的一个(例如市和地址的名称)相关联,或者“类型(考虑到相应值的频率)。由于自动推断的类型可能不是准确的,因此已经开发了用于支持用户来解决错误的图形界面。终于报告了债务收集机构获得的实际电子表格上的实验结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号