Missing values are commonly encountered in software measurement data, and knearest neighbor imputation (kNNI) is one of the most popular imputation procedures used by researchers and practitioners in empirical software engineering. Imputation techniques are used to replace missing values with one or more alternatives. Traditionally, kNNI uses only complete cases as possible donors for imputation (called complete case kNNI or CCkNNI), however a variant of CCkNNI called incomplete case k nearest neighbor imputation (ICkNNI) is an attractive alternative which has received very little attention. We present a detailed comparative study of CCkNNI and ICkNNI with missing software measurement data, and demonstrate that using incomplete cases often increases the effectiveness of nearest neighbor imputation (especially at higher missingness levels), regardless of the type of missingness.
展开▼