首页> 外国专利> SIAMESE NEURAL NETWORKS FOR FLAGGING TRAINING DATA IN TEXT-BASED MACHINE LEARNING

SIAMESE NEURAL NETWORKS FOR FLAGGING TRAINING DATA IN TEXT-BASED MACHINE LEARNING

机译:基于文本的机器学习中标记训练数据的暹罗神经网络

摘要

Techniques performed by a data processing system for analyzing training data for a machine learning model and identifying outliers in the training data herein include obtaining training data for the model from a memory of the data processing system; analyzing the training data using a Siamese Neural Network to determine within-label similarities and cross-label similarities associated with a plurality of data elements within the training data, the within-label representing similarities between a respective data element and a first set of data elements similarly labeled in the training data, the cross-label similarities representing similarities between the respective data element and a second set of data elements dissimilarly labeled in the training data; identifying outlier data elements in the plurality of data elements based on the within-label and cross-label similarities; and processing the training data comprising the outlier data elements. Processing may include deleting the outlier data elements or generating a report.
机译:通过数据处理系统执行的技术,用于分析机器学习模型的训练数据并在本文中识别训练数据中的异常值包括从数据处理系统的存储器获取模型的训练数据;使用暹罗神经网络分析培训数据来确定与训练数据内的多个数据元素相关联的标签内相似性和交叉标签相似性,表示相应数据元素和第一组数据元素之间的相似之处的标签类似地标记在训练数据中,表示各个数据元素和第二组数据元素之间的相似性的横向标签相似度在训练数据中相似;基于标签内和交叉标签相似性识别多个数据元素中的异常数据元素;并处理包括异常值数据元素的培训数据。处理可以包括删除异常值数据元素或生成报告。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号