首页> 外文会议>International Speech Communication Association >Vocabulary Expansion through Automatic Abbreviation Generation for Chinese Voice Search

Vocabulary Expansion through Automatic Abbreviation Generation for Chinese Voice Search




Long named entities are often abbreviated in oral Chinese lan-guage, and this usually leads to out-of-vocabulary(OOV) prob-lems in speech recognition applications. The generation of Chi-nese abbreviations is much more complex than English abbrevi-ations, most of which are acronyms and truncations. In this pa-per, we propose a new method for automatically generating ab-breviations for Chinese named entities and we perform vocabu-lary expansion using output of the abbreviation model for voice search. In our abbreviation modeling, we convert the abbrevi-ation generation problem into a tagging problem and use the conditional random field (CRF) as the tagging tool. In the vo-cabulary expansion, considering the multiple abbreviation prob-lem and limited coverage of top-1 abbreviation candidate, we add top-10 candidates into the vocabulary. In our experiments, for the abbreviation modeling, we achieved the top-10 cover-age of 88.3% by the proposed method; for the voice search, we improved the voice search accuracy from 16.9% to 79.2% by incorporating the top-I0 abbreviation candidates to vocabulary.
机译:长期命名实体经常在口腔中的局域网,这通常会导致语音识别应用中的词汇外(OOV)概率。 Chi-nese缩写的产生比英语缩写要复杂得多,大多数是缩略语和截断。在此PA-PER中,我们提出了一种新的方法,用于自动为中文命名实体生成AB-Breviations,并使用缩写模型的输出来执行Vocebu-Lary扩展。在我们的缩写建模中,我们将Abbrevi-Ation生成问题转换为标记问题,并使用条件随机字段(CRF)作为标记工具。在VO-Cabulary扩展中,考虑到多个缩写概率和顶级缩写候选的有限覆盖,我们将前10名候选人添加到词汇中。在我们的实验中,对于缩写建模,我们通过所提出的方法实现了88.3%的前10次覆盖率;对于语音搜索,我们通过将Top-I0缩写候选者合并到词汇量来从16.9%提高到79.2%的语音搜索准确性。



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号