Entropy Based Measurement of Text Dissimilarity for Duplicate – Detection

Venkatesh Kumar; G. Rajendran

首页> 外文期刊>Modern Applied Science >Entropy Based Measurement of Text Dissimilarity for Duplicate – Detection

【24h】

Entropy Based Measurement of Text Dissimilarity for Duplicate – Detection

机译：基于熵的文本相似度度量用于重复检测

获取原文

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The problem of identifying approximate similarity between pair of strings is an essential step for data cleansing and data integration process. Most existing approaches have relied on generic or manually tuned distance metrics for estimating the similarity potential duplicate. But existing system does not produce the similarity percentage between pair of strings. In this paper we propose a method using entropy and information gain (IG) to find dissimilarity between pair of strings to increase the accuracy of data.

机译：识别字符串对之间的近似相似性的问题是数据清洗和数据集成过程中必不可少的步骤。大多数现有方法都依赖于通用或手动调整的距离量度来估计相似的潜在重复项。但是现有系统不会在字符串对之间产生相似度百分比。在本文中，我们提出了一种使用熵和信息增益（IG）的方法来查找字符串对之间的差异，以提高数据的准确性。

著录项

来源
《Modern Applied Science》 |2010年第9期|共10页
作者
Venkatesh Kumar; G. Rajendran;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类交通工程与公路运输技术管理;
关键词

相似文献

外文文献
中文文献
专利

1. Scale-Rotation Invariant Pattern Entropy for Keypoint-Based Near-Duplicate Detection [J] . Zhao W.-L., Ngo C.-W. IEEE Transactions on Image Processing . 2009,第2期

机译：基于关键点的近重复检测的尺度旋转不变模式熵
2. Automatic boundary detection based on entropy measures for text-independent syllable segmentation [J] . Laleye Frejus A. A., Ezin Eugene C., Motamed Cina Multimedia Tools and Applications . 2017,第15期

机译：基于熵测度的自动边界检测，用于与文本无关的音节分割
3. Deviation detection in text using conceptual graph interchange format and error tolerance dissimilarity function [J] . Siti Sakira Kamaruddin, Abdul Razak Hamdan, Azuraliza AbuBakar, Intelligent data analysis . 2012,第3期

机译：使用概念图交换格式和容错差异功能对文本进行偏差检测
4. Duplicate short text detection based on Word2vec [C] . Jin Gao, Yahao He, Xiaoyan Zhang, IEEE International Conference on Software Engineering and Service Science . 2017

机译：基于Word2vec的重复短文本检测
5. Novel protein functional analysis based on weighted & directed protein overlap network and adjusted entropy measurements. [D] . Zhang, Yixiang. 2016

机译：基于加权和定向蛋白质重叠网络和调整后的熵测量值的新型蛋白质功能分析。
6. METSP: A Maximum-Entropy Classifier Based Text Mining Tool for Transporter-Substrate Identification with Semistructured Text [O] . Min Zhao, Yanming Chen, Dacheng Qu, -1

机译：METSP：基于最大熵分类器的文本挖掘工具用于半结构化文本的转运体-基质识别
7. Entropy Based Measurement of Text Dissimilarity for Duplicate – Detection [O] . Venkatesh Kumar, G. Rajendran 2010

机译：基于熵的文本不相似度的重复检测

Entropy Based Measurement of Text Dissimilarity for Duplicate – Detection

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅