首页> 外国专利> SYSTEMS AND METHODS FOR AUTOMATICALLY CONFIGURING TRAINING DATA FOR TRAINING MACHINE LEARNING MODELS OF A MACHINE LEARNING-BASED DIALOGUE SYSTEM

SYSTEMS AND METHODS FOR AUTOMATICALLY CONFIGURING TRAINING DATA FOR TRAINING MACHINE LEARNING MODELS OF A MACHINE LEARNING-BASED DIALOGUE SYSTEM

机译:用于自动配置基于机器学习的对话系统的机器学习模型的训练数据的系统和方法

摘要

A system and method for improving a machine learning-based dialogue system includes: sourcing a corpus of raw machine learning training data from sources of training data based on a plurality of seed training samples, wherein the corpus of raw machine learning training data comprises a plurality of distinct instances of training data; generating a vector representation for each distinct instance of training data; identifying statistical characteristics of the corpus of raw machine learning training data based on a mapping of the vector representation for each distinct instance of training data; identifying anomalous instances of the plurality of distinct instances of training data of the corpus of raw machine learning training data based on the identified statistical characteristics of the corpus; and curating the corpus of raw machine learning training data based on each of the instances of training data identified as anomalous instances.
机译:一种用于改进基于机器学习的对话系统的系统和方法,包括:基于多个种子训练样本从训练数据的源中获取原始机器学习训练数据的语料库,其中原始机器学习训练数据的语料库包括多个训练数据的不同实例;为训练数据的每个不同实例生成矢量表示;基于训练数据的每个不同实例的矢量表示的映射,识别原始机器学习训练数据的语料库的统计特征;基于所识别出的原始机器学习训练数据的语料库的多个不同实例的训练数据的异常实例;根据识别为异常实例的训练数据的每个实例来管理原始机器学习训练数据的语料库。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号