首页> 外国专利> System and method for identifying structured data items lacking requisite information for rule-based duplicate detection

System and method for identifying structured data items lacking requisite information for rule-based duplicate detection

机译:用于识别缺少必要信息以进行基于规则的重复检测的结构化数据项的系统和方法

摘要

Embodiments of a system and method for identifying structured data items lacking requisite information for rule-based duplicate detection are described. Embodiments may include generating a deficiency score for each of multiple structured data items including applying a set of rules based on duplicate detection techniques to each given structured data item in order to perform a comparison of the given structured data item to itself. The deficiency score of the given structured data item may be based on a result of the comparison. Embodiments may also include, based on the deficiency scores of the structured data items, identifying one or more deficient structured data items having less than a requisite quantity of information for performing duplicate detection on structured data items. Embodiments may also include identifying one or more key attributes missing from some of the one or more deficient structured data items and requesting those key attributes.
机译:描述了用于识别缺少用于基于规则的重复检测的必要信息的结构化数据项的系统和方法的实施例。实施例可以包括为多个结构化数据项中的每一个生成缺陷评分,包括将基于重复检测技术的一组规则应用于每个给定的结构化数据项,以便执行给定的结构化数据项与其自身的比较。给定结构化数据项的缺陷评分可以基于比较的结果。实施例还可包括基于结构化数据项的缺陷分数,识别一个或多个缺陷化结构化数据项,该缺陷化结构化数据项具有的信息量少于用于对结构化数据项执行重复检测的必要信息量。实施例还可包括识别从一个或多个缺陷结构化数据项中的一些缺失的一个或多个关键属性,并请求那些关键属性。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号