With the development of the medical insurance industry in China, medical insurance data with complex, multidimensional and interdisciplinary feature are extremely increasing. How to mine the potential value from the vast amounts of data and improve the efficiency of data analysis are topical issues in the study of data mining. This paper presents an improved LOF Outlier Detection Algorithm - GdiLOF, an algorithm which reduces dataset by removing the normal data and introduces information entropy to improve the accuracy of the LOF algorithm. Platform adaptability is analyzed by running it on Hadoop platform. The experimental results show that GdiLOF algorithm has high efficiency and the accuracy is 6 percentage points higher than LOF algorithm. And it also run better in the Hadoop distributed platforms, as well as having obvious advantages in processing huge amounts of data.
展开▼