首页>
外国专利>
Techniques for correcting linguistic training bias in training data
Techniques for correcting linguistic training bias in training data
展开▼
机译:纠正训练数据中语言训练偏向的技术
展开▼
页面导航
摘要
著录项
相似文献
摘要
#$%^&*AU2018232914B220200702.pdf#####ABSTRACT TECHNIQUES FOR CORRECTING LINGUISTIC TRAINING BIAS IN TRAINING DATA In automated assistant systems, a deep-learning model in form of a long shortterm memory (LSTM) classifier is used for mapping questions to classes, with each class having a manually curated answer. A team of experts manually create the training data used to train this classifier. Relying on human curation often results in such linguistic training biases creeping into training data, since every individual has a specific style of writing natural language and uses some words in specific context only. Deep models end up learning these biases, instead of the core concept words of the target classes. In order to correct these biases, meaningful sentences are automatically generated using a generative model, and then used for training a classification model. For example, a variational autoencoder (VAE) is used as the generative model for generating novel sentences and a language model (LM) is utilized for selecting sentences based on likelihood.5/6 500 RECEIVE A QUERY FROM A USER 502 GENERATE A SET OF QUERIES ASSOCIATED WITH THE RECEIVED QUERY USING A LONG SHORT-TERM MEMORY VARIATIONAL AUTOENCODER (LSTM-VAE) AT AN INFERENCETME 504 DISCARD ONE ORMOREQUERES COMPRISING CONSECUTIVELY REPEATING WORDS FROM THE SET OF GENERA TED QUERIES TO CREATE A SUBSET OF THE GENERATED QUERIES SELECT ONE OR MORE QUERIES FROM TIHE SUBSET OF THE GENE RATED QUERIES BASED ON LIKELIHOOD VIA A L ANGUAGE MODEL CLASSIFY THE ONE OR MORE SELECTED QUERIES AS QUERIES THAT EXISTS IN THE FIRST SET OF TRAINING DATA AND NEW 510 QUERIES USING A FIRST CLASSIFIER MODEL AUGMENT THE FIRST SET OF TRAINING DATA WIT H THENEW 512 QUERIES TO OBTAIN A SECOND SET OFT RAINING DATA TRAIN A SECOND CLASSIFIE R MODEL USING THE SECOND SET OF TRAINING DATA, THUS CORRECTING LINGUISTIC TRAINING BIAS 514 IN TRAINING DATA FIG.5
展开▼