一种基于模糊选项关系的关键属性提取方法

熊熙; 乔少杰; 韩楠; 元昌安; 张海清; 李斌勇

摘要

Fuzzy analysis method has been widely used in medical domains including auxiliary diagnosis of mental diseases.Attribute reduction methods play an important role in filtering redundant information and extracting essential information, and facilitating the whole decisionmaking process.Valuable information extracted by these methods can reveal underlying medical knowledge through a novel perspective of clinical medicine.It is difficult for many untrained participants to identify the fuzzy boundaries between the options in psychometric scales, i.e., it is difficult to distinguish options with the same meaning and different degrees.The noise data are generated due to the intrinsic fuzziness of clinical psychology and the psychometric data.If the attributes of psychological data are viewed as the condition attributes of an information system, the key attribute can be obtained by attribute discriminant methods, which will simplify the clinical screening process for suspected patients.This study focuses on the extracted key attributes or the attributes with high weight values, in order to quickly discover the patients with abnormal key attributes and give them prior treatment.A Fuzzy-Option based Attribute Discriminant method is proposed, called FOAD, which contains three main phases:data collection, fuzzy option selection and reduction as well as sort and extraction of key attributes.In regard to psychometric data, each sample contains several physical symptoms, which can be viewed as attributes, then it selects an option for each attribute.It is necessary to take the number of samples and the meanings of options into consideration simultaneously when selecting fuzzy options which will be removed.As the key part of the whole approach, the fuzzy option reduction algorithm can merge fuzzy options into other reserved options in order to reduce the fuzziness of psychometric data.Two real clinical datasets are used to verify the performance of FOAD algorithm.The key attributes are obtained from datasets by multiple categories of attribute discriminant algorithms.Then, it classifies samples by logistic regression based on the key attributes and diagnosis results, which are viewed as conditional attributes and classification labels, respectively.The experimental results on the real datasets demonstrate that the prediction accuracy can be improved by 3.3％-14.1％without increasing the computational complexity.Although the operation of option reduction loses some information in datasets, the option distribution becomes clearer by the merging operation.Linear Discrimination Analysis (LDA) under FOAD is sensitive to various parameters, especially to the number of reserved attributes.The prediction accuracy of LDA is increased from 6.7％when reserving the least attributes to14.1％when reserving the most attributes.Principal Component Analysis (PCA) algorithm chooses the projection direction with the maximal variance of data and retains the maximal information.Due to the poor classification performance, PCA can hardly be improved through FOAD.The prediction accuracy of PCA degrades even under some specific conditions.Moreover, LDA based on FOAD demonstrates better prediction accuracy than other fuzzy attribute discriminant methods.It is concluded that it is difficult to process the fuzzy clinical psychometric data by conventional statistical analysis methods.The special preprocessing methods, such as the stateof-the-art fuzzy set and rough set techniques, can eliminate the noise of data and improve the clinical diagnosis effect.%模糊分析方法已广泛应用于医学实践包括对心理疾病的辅助诊断.属性约简方法在过滤冗余信息并提取关键信息时起到了重要作用,使整个临床决策过程更加准确和高效.这些方法抽取的有价值信息可以从新的视角揭示深层次医学知识.很多未经培训的参与者很难识别心理量表中选项间模糊的界线,即很难区分拥有相同意义但程度不同的选项.临床心理学自身的模糊性和心理测量数据的模糊性都将带来噪声.如果将心理测量数据中的属性看作信息系统的条件属性,利用降维算法可提取关键属性,从而简化对疑似患者的临床筛查过程.实际使用时,可对提取的关键属性或者拥有高权重的属性进行重点关注,从而迅速定位拥有异常关键属性的患者,对其优先处理.由此该文提出一种称为FOAD(Fuzzy-Option based Attribute Discriminant method)的基于模糊选项关系的关键属性提取方法,包括三个主要步骤:数据获取、模糊选项的选择与约简以及关键属性的排序与提取.每个参与者样本包含若干身体症状属性,为每个属性都选择一个程度选项.选择模糊选项时须同时考虑选择该选项的样本数量和选项的程度含义.而模糊选项约简算法作为整个方法的核心,可以将模糊选项合并到其他选项,以降低心理测量数据中选项的模糊度.实验中采用两个真实临床数据集验证FOAD算法的性能.首先使用各种属性提取算法对测试数据集进行处理,获取关键属性,然后将输出的关键属性作为条件属性,以诊断结论作为分类标签,利用逻辑回归方法对样本数据进行分类.实验结果表明:FOAD算法在不增加时间复杂度的前提下能将分类准确率普遍提高3.3％～14.1％.虽然选项约简操作造成部分信息的损失,但是合并模糊选项使选项分布更加清晰.FOAD作用下的LDA(Linear Discrimination Analysis)对各种参数敏感,尤其是对保留属性的个数.LDA的预测准确率从保留最少属性时提高6.7％,上升到保留最多属性时提高14.1％.PCA(Principal Component Analysis)算法选择的投影方向会使数据方差最大,保留的信息量最多,但分类效果差.因此FOAD算法很难应用于提高PCA的预测准确率,甚至在个别情况下,出现了FOAD引起PCA分类准确率降低的情况.此外,实验发现基于FOAD的LDA算法比其他属性模糊提取算法具有更高预测准确率.心理诊断数据具有明显的模糊性,一般的统计分析方法往往不能得到需要的结果.而利用最新的模糊集和粗糙集等特殊的数据预处理方法可以消除这种数据噪声,提高临床诊断效果.

一种基于模糊选项关系的关键属性提取方法

摘要

著录项

相似文献

相关主题

期刊订阅