The area of knowledge discovery and data mining are growing rapidly. A large number of methods are employed to discrete data, however, most of the existing discretion methods are applied in the case of attributes with real-value. In the practical application, the attribute value is interval number in many cases. Aiming at this problem, a new discretization algorithm applied to interval numbers is proposed. Similarity degree of interval number is used to describe the similar relation of two interval numbers. Threshold degree is defined to ensure discrete relationship between the data to implement algorithm. A new variable-associated degree is proposed through analysing action of similarity degree in the algorithm, and associated degree is used to improve algorithm. A group of data set is applied to testing the performance of the algorithm and the experiment result is compared with other discretization algorithms. The experiment result shows that the algorithm is effective.%随着数据挖掘和知识发现等技术的迅速发展,出现了很多数据离散的算法,但是,已有的离散化方法大多是针对固定点上的连续属性值的情况,实际应用中大量存在着连续区间属性值的情况.针对这一问题,提出了一种连续区间属性值离散化的新方法.通过区间数的相似度来描述对象问的相似关系,定义相似度阈度确定离散关系,来实现对区间数据的离散化,经过分析相似度在算法中的作用,提出了一种新的变量——关联度,改进了算法.采用多组数据对此算法的性能进行了检验,与其他算法做了对比试验,试验结果表明此算法是有效的.
展开▼