首页> 外国专利> Techniques for correcting linguistic training bias in training data

Techniques for correcting linguistic training bias in training data

机译:纠正训练数据中语言训练偏向的技术

摘要

#$%^&*AU2018232914B220200702.pdf#####ABSTRACT TECHNIQUES FOR CORRECTING LINGUISTIC TRAINING BIAS IN TRAINING DATA In automated assistant systems, a deep-learning model in form of a long shortterm memory (LSTM) classifier is used for mapping questions to classes, with each class having a manually curated answer. A team of experts manually create the training data used to train this classifier. Relying on human curation often results in such linguistic training biases creeping into training data, since every individual has a specific style of writing natural language and uses some words in specific context only. Deep models end up learning these biases, instead of the core concept words of the target classes. In order to correct these biases, meaningful sentences are automatically generated using a generative model, and then used for training a classification model. For example, a variational autoencoder (VAE) is used as the generative model for generating novel sentences and a language model (LM) is utilized for selecting sentences based on likelihood.5/6 500 RECEIVE A QUERY FROM A USER 502 GENERATE A SET OF QUERIES ASSOCIATED WITH THE RECEIVED QUERY USING A LONG SHORT-TERM MEMORY VARIATIONAL AUTOENCODER (LSTM-VAE) AT AN INFERENCETME 504 DISCARD ONE ORMOREQUERES COMPRISING CONSECUTIVELY REPEATING WORDS FROM THE SET OF GENERA TED QUERIES TO CREATE A SUBSET OF THE GENERATED QUERIES SELECT ONE OR MORE QUERIES FROM TIHE SUBSET OF THE GENE RATED QUERIES BASED ON LIKELIHOOD VIA A L ANGUAGE MODEL CLASSIFY THE ONE OR MORE SELECTED QUERIES AS QUERIES THAT EXISTS IN THE FIRST SET OF TRAINING DATA AND NEW 510 QUERIES USING A FIRST CLASSIFIER MODEL AUGMENT THE FIRST SET OF TRAINING DATA WIT H THENEW 512 QUERIES TO OBTAIN A SECOND SET OFT RAINING DATA TRAIN A SECOND CLASSIFIE R MODEL USING THE SECOND SET OF TRAINING DATA, THUS CORRECTING LINGUISTIC TRAINING BIAS 514 IN TRAINING DATA FIG.5
机译:#$%^&* AU2018232914B220200702.pdf #####抽象纠正语言训练偏差的技术训练数据在自动助理系统中,以长短形式的深度学习模型术语记忆(LSTM)分类器用于将问题映射到类,每个班级都有一个手动策划的答案。专家团队手动创建用于训练该分类器的训练数据。经常依靠人类的策展导致这种语言训练偏见逐渐蔓延到训练数据中,因为每个个人具有特定的自然语言写作风格,并在其中使用了一些单词仅特定上下文。深度模型最终会学习这些偏见,而不是目标类的核心概念词。为了纠正这些偏见,使用生成模型自动生成有意义的句子,并且然后用于训练分类模型。例如,变体自动编码器(VAE)被用作生成新颖的模型句子和语言模型(LM)用于根据以下信息选择句子可能性。5/6500接收来自用户的查询502生成与接收到的一组查询使用长短期记忆变量查询推理机中的自动编码器(LSTM-VAE)504丢弃连续包含的一个或多个从一般查询中重复单词到创建生成查询的子集从“该”子集中选择一个或多个查询基于语言的基因评价等级查询模型将一个或多个选定查询分类为查询第一组训练数据和新510中存在的使用第一个分类器模型的查询新增THE 512的第一套训练数据查询第二组训练数据的查询使用第二组训练第二个分类模型训练数据,从而纠正语言训练偏差514在训练数据中图5

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号