首页> 外国专利> TECHNIQUES FOR AUTOMATED DATA CLEANSING FOR MACHINE LEARNING ALGORITHMS

TECHNIQUES FOR AUTOMATED DATA CLEANSING FOR MACHINE LEARNING ALGORITHMS

机译:机器学习算法的自动数据清洗技术

摘要

Machine learning models typically are based on processing large-volume datasets, and datasets are preprocessed so that the machine learning can provide sound results. In building a model, certain example embodiments generate meta-features for each of a number of independent variables in an accessed portion of the dataset. The meta-features are provided as input to pre-trained classification models. Those models output, for the independent variables, indications of one or more appropriate missing value imputation operations, and one or more appropriate other preprocessing data cleansing related operations. The data in the dataset is transformed by selectively applying the missing value imputation operation(s) and the other preprocessing operation(s), in accordance with the independent variables associated with the data, thereby performing the preprocessing in an automated and programmatic way that helps improve the quality of the built model. Ultimately, queries received over a computer-mediated interface can be processed using the built machine learning model.
机译:机器学习模型通常基于处理大量数据集,并对数据集进行预处理,以便机器学习可以提供良好的结果。在建立模型中,某些示例实施例为数据集的被访问部分中的多个自变量中的每一个生成元特征。元特征被提供作为预训练分类模型的输入。对于自变量,那些模型输出对一种或多种适当的缺失值插补操作以及一种或多种适当的其他预处理数据清除相关操作的指示。根据与数据相关联的自变量,通过选择性地应用缺失值插补运算和其他预处理运算来转换数据集中的数据,从而以自动化和编程方式执行预处理,从而有助于提高构建模型的质量。最终,可以使用内置的机器学习模型来处理通过计算机介导的界面接收到的查询。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号