首页> 外国专利> Linking Data Elements Based on Similarity Data Values and Semantic Annotations

Linking Data Elements Based on Similarity Data Values and Semantic Annotations

机译:基于相似性数据值和语义​​注释的数据元素链接

摘要

Data elements from data sources and having a data value set are linked by using hash functions to determine a dimensionally reduced instance signature for each data element based on all data values associated with that data element to yield a plurality of dimensionally reduced instance signatures of equivalent fixed size such that similarities among the data values in the data value sets across all data elements is maintained among the plurality of instance signatures. Candidate pairs of data elements to link are identified using the plurality of instance signatures in locality sensitive hash functions, and a similarity index is generated for each candidate pair using a pre-determined measure of similarity. Candidate pairs of data elements having a similarity index above a given threshold are linked.
机译:通过使用哈希函数链接数据源中具有数据值集的数据元素,以基于与该数据元素相关联的所有数据值为每个数据元素确定降维实例签名,以产生等效的固定维数的降维实例签名在多个实例签名之间保持大小,以使得跨所有数据元素的数据值集中的数据值之间的相似性得以维持。使用位置敏感哈希函数中的多个实例签名来标识要链接的数据元素候选对,并使用预定的相似性度量为每个候选对生成相似性索引。具有相似指数高于给定阈值的候选数据元素对被链接。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号