首页> 外文期刊>Medical informatics and the Internet in medicine >Appropriate medical data categorization for data mining classification techniques.
【24h】

Appropriate medical data categorization for data mining classification techniques.

机译:针对数据挖掘分类技术的适当医学数据分类。

获取原文
获取原文并翻译 | 示例
           

摘要

Some data mining (DM) methods, or software tools, require normalized data, others rely on categorized data, and some can accommodate multiple data scales. Each DM technique has a specific background theory; therefore, different results are expected when applying multiple methods. The purpose of this study is to find the data format appropriate for each DM classification technique for wider applications, and efficiently to obtain trustworthy results. Considering the nature of medical data, categorical variables are sometimes useful for making decisions and can make it easier to extrapolate knowledge. In this study, three mathematical data categorization methods (Fusinter, minimum description length principle [MDLPC] and Chi-merge) were applied to accommodate five data mining classification techniques (statistics discriminant analysis, supervised classification with Neural Networks, Decision trees, Genetic supervised clustering and Bayesian classification [probability neural networks; PNN]) using a heartdisease database with four types of data (continuous data, binary data, nominal data, and ordinal data). Compared with original or normalized data, data categorized by the MDLPC categorization method was found to perform better in most of the DM classification techniques used in this study. Categorical data is good for most DM classification techniques (e.g. classification of disease and non-disease groups) and is relatively easy to use for extracting medical knowledge.
机译:一些数据挖掘(DM)方法或软件工具需要规范化的数据,另一些依赖分类的数据,而某些可以容纳多个数据规模。每种DM技术都有特定的背景理论。因此,使用多种方法时,预期会有不同的结果。这项研究的目的是为每种DM分类技术找到适合更广泛应用的数据格式,并有效地获得可信赖的结果。考虑到医疗数据的性质,分类变量有时对于做出决策很有用,并且可以使推断知识变得更加容易。在这项研究中,应用了三种数学数据分类方法(Fusinter,最小描述长度原理[MDLPC]和Chi-merge)来适应五种数据挖掘分类技术(统计判别分析,神经网络监督分类,决策树,遗传监督聚类)和贝叶斯分类[概率神经网络; PNN]),使用具有四种数据类型(连续数据,二进制数据,名义数据和有序数据)的疾病数据库。与原始数据或规范化数据相比,通过MDLPC分类方法分类的数据在本研究中使用的大多数DM分类技术中表现更好。分类数据对于大多数DM分类技术(例如疾病和非疾病组的分类)都是有益的,并且相对容易用于提取医学知识。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号