首页> 外文会议>International conference on information quality >PRACTICAL REGULAR EXPRESSION MINING AND ITS INFORMATION QUALITY APPLICATIONS
【24h】

PRACTICAL REGULAR EXPRESSION MINING AND ITS INFORMATION QUALITY APPLICATIONS

机译:实用的正则表达挖掘及其信息质量应用

获取原文

摘要

Regular expressions are convenient devices representing common patterns in collections of text strings that can be used as filters insuring information quality in textual data. An algorithm inducing a representative regular expression given a set of text strings (possibly containing errors) is described. Such an algorithm is useful in estimating information quality and performing automated cleansing of legacy data or the data obtained by the means of automated sensing (e.g. OCR). A number of practical heuristics improving algorithm's real-life performance are introduced. A framework employing this algorithm is outlined.
机译:正则表达式是表示文本字符串集合中的常用模式的方便设备,可用作文本数据中的信息质量的过滤器。描述给出给给给定一组文本字符串(可能包含错误)的代表性正则表达式的算法。这种算法可用于估计信息质量并执行传统数据的自动清洁或通过自动化感测的装置获得的数据(例如,OCR)。介绍了许多实用的启发式算法的实际寿命性能。概述了采用该算法的框架。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号