首页> 外国专利> Training corpus generation methods, devices, equipment and storage media

Training corpus generation methods, devices, equipment and storage media

机译：训练语料库生成方法，设备，设备和存储介质

页面导航

摘要
著录项
相似文献

摘要

PROBLEM TO BE SOLVED: To effectively improve the effect of speech recognition, significantly shorten the iterative cycle of a speech recognition model, and save a large amount of resources. A method of generating a training corpus is to mine a plurality of labeled corpus data in a user behavior log associated with a target application program, and to perform a first behavior log and a second behavior log of each labeled corpus data. Based on the association with the action log, the user voice and the corresponding voice recognition result in each corpus data are judged as positive or negative feedback training corpus. The corpus data includes a first action log containing the user's voice and the corresponding voice recognition result, and a second action log that is temporally associated with it and belongs to the same user. Based on user behavior, the speech-recognized positive and negative feedback training corpus is automatically and intentionally mined to provide training for subsequent speech recognition models. [Selection diagram] Fig. 1

机译：要解决的问题：为了有效地提高语音识别的效果，显着缩短语音识别模型的迭代周期，并节省大量资源。一种生成训练语料库的方法是在与目标应用程序相关联的用户行为日志中挖掘多个标记语料库数据，并对每个标记语料库数据执行第一行为日志和第二行为日志。基于与动作日志的关联，将每个语料数据中的用户语音和相应的语音识别结果判断为正反馈训练语料或负反馈训练语料。语料库数据包括包含用户语音和相应语音识别结果的第一动作日志，以及在时间上与其相关联并属于同一用户的第二动作日志。根据用户行为，会自动有意挖掘语音识别的正反馈和负反馈训练语料库，以为后续的语音识别模型提供训练。 [选择图]图1

著录项

公开/公告号JP2020149053A

专利类型
公开/公告日2020-09-17

原文格式PDF
申请/专利权人ベイジンバイドゥネットコムサイエンスアンドテクノロジーカンパニーリミテッド;
展开▼

申请/专利号JP20200041151
发明设计人ディン;シーチァン;ファン;ジーヂョウ;ジャン;ヂョンウェイ;マ;ウェンタオ;
展开▼

申请日2020-03-10
分类号G10L15/06;G10L15;G10L15/22;G10L15/10;
国家 JP
入库时间 2022-08-21 11:37:26

相似文献

专利
外文文献
中文文献