Scope and method of study. The goal of this research is to adapt data masking techniques for estimation of confidential, continuous numeric data involving non-monotonic relationships in privacy-preserving data mining (PPDM). Estimation techniques are an important class of data mining (DM) techniques frequently used in real-world problems. In many cases, estimation involves confidential numeric data, raising issues of privacy and confidentiality. Masking techniques can be used to protect the confidentiality of numeric data. However, existing masking methods do not preserve non-monotonic relationships. They replicate correlation-based aggregate measures in masked data to reproduce corresponding monotonic relationships. One challenge is that there are no aggregate measures for capturing non-monotonic relationships. Hence, this study proposes a new approach called Relationship-Based Masking (RBM) that utilizes relationships (conditional expectations), rather than aggregate measures such as covariance matrices, as a basis for masking. Four new adapted RBM methods are proposed. Other challenges that arise in the case of non-monotonic relationships include the measurement of data utility and data security. This study adopts existing security measures based on Mean Squared Errors (MSE) and proposes new data utility and data security measures. The theories behind these new methods are also described, and their effectiveness is established empirically in terms of data utility and disclosure risks using simulated numerical datasets.; Findings and conclusions. The optimality of RBM in terms of data security (based on the characteristics of original datasets) is demonstrated theoretically and empirically. Additionally, RBM can maintain certain classes of relationships. This study shows theoretically and empirically that RBM can maintain original relationships in masked data when the relationships among confidential attributes are linear regardless of the types of relationships between non-confidential and confidential attributes. It also demonstrates empirically the effectiveness of RBM in maintaining other classes of relationships. However, RBM is not able to maintain all possible classes of relationships. Finally, interesting results relating the characteristics of masked data to the characteristics of original data, prior to masking, are shown theoretically and empirically.
展开▼