首页> 外文期刊>Journal of Medical Imaging and Health Informatics >A Unified Approach to Detect the Record Duplication Using BAT Algorithm and Fuzzy Classifier for Health Informatics
【24h】

A Unified Approach to Detect the Record Duplication Using BAT Algorithm and Fuzzy Classifier for Health Informatics

机译:使用BAT算法和模糊分类器的健康信息学统一检测记录重复的方法

获取原文
获取原文并翻译 | 示例
       

摘要

Generally, data cleaning techniques are employed in data engineering applications to improve the quality Of data. Besides, removing duplicate records after the identification from the medical data sets are performed by data cleansing for improving the quality of medical data and that lights up the disease diagnosis and health wellbeing to the next level. Likely, significant efforts have been already made by the different researchers to identify the duplicate records from medical data using soft computing techniques. Accordingly, a genetic programming approach was given recently into record de-duplication that combined several different pieces of evidence extracted from the data content. In this paper, it is planned to develop a data duplication identification technique using soft computing methods for improved data assessment in medical data. The major problem in the field of medical data and other health information is the persistence of multiple data or redundant data. Thus in the proposed method, an algorithms based on bat algorithm and fuzzy classifier is addressed. In the method, the selection of pieces can reduce the time consumption in testing since the small information is enough to do the duplication detection task using the defined techniques. The overall steps of the proposed data duplication technique is explained in three steps, such as Extraction of pieces of information from the input data, Selection of best set of pieces through BAT algorithm and detection of duplication using fuzzy classifier. At first, the input data is given to hashing-based algorithm and Levenshtein distance to find different set of features. Then, bat algorithm which is one of the recent optimization algorithm to select the best set of features from the initial sets. Then, from the optimal set of features, the detection will be done using fuzzy classifier. The experimentation will be done using four different datasets where two are from health informatics to evaluate the performance in medical application and the performance of the algorithm will be compared with the existing algorithm using detection accuracy. The experimentations showed that, the accuracy rate obtained for the proposed approach is .906 while .79 for the existing approach based on genetic algorithm.
机译:通常,数据清理技术用于数据工程应用程序中,以提高数据质量。此外,通过数据清洗,从医疗数据集中删除重复的记录,以提高医疗数据的质量,使疾病的诊断和健康状况更上一层楼。可能的是,不同的研究人员已经做出了巨大的努力,以使用软计算技术从医学数据中识别重复记录。因此,最近在复制记录中采用了一种遗传编程方法,该方法结合了从数据内容中提取的几条不同证据。在本文中,计划开发一种使用软计算方法的数据重复识别技术,以改进医学数据中的数据评估。医学数据和其他健康信息领域的主要问题是多个数据或冗余数据的持久性。因此,在提出的方法中,提出了一种基于蝙蝠算法和模糊分类器的算法。在该方法中,由于小信息足以使用定义的技术执行重复检测任务,因此选择件可以减少测试时间。提议的数据复制技术的总体步骤分为三个步骤,例如从输入数据中提取信息片段,通过BAT算法选择最佳片段集以及使用模糊分类器检测重复片段。首先,将输入数据提供给基于散列的算法和Levenshtein距离,以找到不同的特征集。然后,蝙蝠算法是最近的优化算法之一,可以从初始集合中选择最佳特征集。然后,根据最佳特征集,将使用模糊分类器进行检测。实验将使用四个不同的数据集进行,其中两个来自健康信息学,以评估其在医疗应用中的性能,并将使用检测精度将该算法的性能与现有算法进行比较。实验表明,该方法的准确率是.906,而基于遗传算法的方法的准确率是.79。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号