...
首页> 外文期刊>Indian Journal of Science and Technology >Probabilistic multiple correlation based term weighting scheme for measuring similarity of unstructured text records
【24h】

Probabilistic multiple correlation based term weighting scheme for measuring similarity of unstructured text records

机译:基于概率的基于术语加权方案,用于测量非结构化文本记录的相似性

获取原文
           

摘要

Background/Objectives: In this study, a term weighting scheme derived from probabilistic multiple correlation is defined for measuring similarity between unstructured text records. Methods: While the intra-correlation is the correlation of terms in the same record, inter-correlation is the correlation of terms that exist in different records. Probabilistic multiple correlation-based term weighting calculates the weight or relevance of a term by considering its intra-correlation with one or more terms simultaneously. Subsequently, the term weights are used in measuring the inter-correlation of terms and then the similarity between two text records. Findings: The experiments are run on unstructured text records that are incomplete and employ abbreviations. There is significant improvement in precision, recall and f-score using probabilistic multiple correlation based term weighting scheme when compared with probabilistic simple correlation weighting scheme. Applications: Using probabilistic multiple correlation based term weighting scheme can improve the overall accuracy in matching unstructured text records that contain abbreviations and incomplete data.
机译:背景/目标:在本研究中,定义了从概率多相关的术语加权方案用于测量非结构化文本记录之间的相似性。方法:虽然相关内的术语是在相同记录中的术语相关性,但间相互关联是不同记录中存在的术语的相关性。概率基于多相关的术语加权通过同时考虑其与一个或多个术语的相关性来计算术语的权重或相关性。随后,术语权重用于测量术语的间相互关联,然后用于两个文本记录之间的相似性。调查结果:实验在非结构化的文本记录上运行,这些记录是不完整和采用缩写的。与概率的简单相关权加权方案相比,使用概率多相关的术语加权方案有显着改善,召回和F分数使用概率多相关的术语加权方案。应用程序:使用概率的基于多相关的术语加权方案可以提高匹配包含缩写和不完整数据的非结构化文本记录中的整体准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号