首页> 外国专利> SYSTEMS AND METHODS FOR AUTOMATICALLY CONFIGURING TRAINING DATA FOR TRAINING MACHINE LEARNING MODELS OF A MACHINE LEARNING-BASED DIALOGUE SYSTEM

SYSTEMS AND METHODS FOR AUTOMATICALLY CONFIGURING TRAINING DATA FOR TRAINING MACHINE LEARNING MODELS OF A MACHINE LEARNING-BASED DIALOGUE SYSTEM

机译：用于自动配置基于机器学习的对话系统的机器学习模型的训练数据的系统和方法

页面导航

摘要
著录项
相似文献

摘要

A system and method for improving a machine learning-based dialogue system includes: sourcing a corpus of raw machine learning training data from sources of training data based on a plurality of seed training samples, wherein the corpus of raw machine learning training data comprises a plurality of distinct instances of training data; generating a vector representation for each distinct instance of training data; identifying statistical characteristics of the corpus of raw machine learning training data based on a mapping of the vector representation for each distinct instance of training data; identifying anomalous instances of the plurality of distinct instances of training data of the corpus of raw machine learning training data based on the identified statistical characteristics of the corpus; and curating the corpus of raw machine learning training data based on each of the instances of training data identified as anomalous instances.

机译：一种用于改进基于机器学习的对话系统的系统和方法，包括：基于多个种子训练样本从训练数据的源中获取原始机器学习训练数据的语料库，其中原始机器学习训练数据的语料库包括多个训练数据的不同实例;为训练数据的每个不同实例生成矢量表示;基于训练数据的每个不同实例的矢量表示的映射，识别原始机器学习训练数据的语料库的统计特征;基于所识别出的原始机器学习训练数据的语料库的多个不同实例的训练数据的异常实例;根据识别为异常实例的训练数据的每个实例来管理原始机器学习训练数据的语料库。

著录项

公开/公告号US2020258007A1

专利类型
公开/公告日2020-08-13

原文格式PDF
申请/专利权人 CLINC INC.;
展开▼

申请/专利号US202016864140
发明设计人 STEFAN LARSON;ANISH MAHENDRAN;ANDREW LEE;JONATHAN K. KUMMERFELD;PARKER HILL;MICHAEL A. LAURENZANO;JOHANN HAUSWALD;LINGJIA TANG;JASON MARS;
展开▼

申请日2020-04-30
分类号G06N20/10;G06F40/279;G06K9/62;G06F17/18;
国家 US
入库时间 2022-08-21 11:26:14

相似文献

专利
外文文献
中文文献