首页> 外国专利> APPARATUS FOR GENERATING HIGH FREQUENCY VOCABULARY STRING RECOGNITION UNIT IN A DIALOGUE AND READ LARGE VOCABULARY SPEECH RECOGNITION SYSTEM AND METHOD THEREFOR

APPARATUS FOR GENERATING HIGH FREQUENCY VOCABULARY STRING RECOGNITION UNIT IN A DIALOGUE AND READ LARGE VOCABULARY SPEECH RECOGNITION SYSTEM AND METHOD THEREFOR

机译:在对话和阅读大词汇语音识别系统中产生高频词汇字符串识别单元的装置及方法

摘要

The present invention utilizes a high frequency pseudomorphic sequence as one recognition unit, and generates a high frequency lexical sequence recognition unit of a conversational and reading object large vocabulary continuous speech recognition system for generating a recognition unit of an intermediate form between a pseudomorpheme and a word. And to a method thereof. As described above, the present invention provides a frequency information extraction unit 301 for extracting continuous lexical pair frequency information from a pseudo morpheme-tagged text corpus, and the frequency information extracted by the frequency information extraction unit 301 and the length of each lexical pair. After modifying the text corpus based on the combined vocabulary selection unit 302 for selecting a vocabulary to be combined based on the information, and the vocabulary set selected by the combined vocabulary selection unit 302, a high frequency continuous vocabulary pair To generate a high frequency lexical sequence recognition unit based on the pseudo morpheme combining information corrector 303 that combines the unicode into one to generate a modified text corpus, and the text corpus generated by the pseudo morpheme combined information corrector 303. And a unit generating unit 304.;Conversations and readings Large vocabulary, text corpus, vocabulary dictionary, language model, pronunciation dictionary
机译:本发明利用高频伪形态序列作为一个识别单元,并生成对话和阅读对象大词汇连续语音识别系统的高频词汇序列识别单元,以产生伪语素与单词之间的中间形式的识别单元。 。及其方法。如上所述,本发明提供了一种频率信息提取单元301,用于从带有伪语素标记的文本语料库中提取连续词汇对频率信息,以及由频率信息提取单元301提取的频率信息和每个词汇对的长度。在基于用于根据信息选择要组合的词汇的组合词汇选择单元302和由组合词汇选择单元302选择的词汇集修改了文本语料库之后,高频连续词汇对生成高频词汇序列识别单元基于伪词组合信息校正器303,该伪词组合信息校正器303将unicode组合成一个以生成修改后的文本语料库,以及由伪词素组合信息校正器303生成的文本语料库。以及单元生成单元304。大词汇量,文本语料库,词汇词典,语言模型,发音词典

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号