首页>
外国专利>
TECHNIQUES FOR AUTOMATED DATA CLEANSING FOR MACHINE LEARNING ALGORITHMS
TECHNIQUES FOR AUTOMATED DATA CLEANSING FOR MACHINE LEARNING ALGORITHMS
展开▼
机译:机器学习算法的自动数据清洗技术
展开▼
页面导航
摘要
著录项
相似文献
摘要
Machine learning models typically are based on processing large-volume datasets, and datasets are preprocessed so that the machine learning can provide sound results. In building a model, certain example embodiments generate meta-features for each of a number of independent variables in an accessed portion of the dataset. The meta-features are provided as input to pre-trained classification models. Those models output, for the independent variables, indications of one or more appropriate missing value imputation operations, and one or more appropriate other preprocessing data cleansing related operations. The data in the dataset is transformed by selectively applying the missing value imputation operation(s) and the other preprocessing operation(s), in accordance with the independent variables associated with the data, thereby performing the preprocessing in an automated and programmatic way that helps improve the quality of the built model. Ultimately, queries received over a computer-mediated interface can be processed using the built machine learning model.
展开▼