首页> 外文期刊>Computer speech and language >Vocabulary expansion through automatic abbreviation generation for Chinese voice search
【24h】

Vocabulary expansion through automatic abbreviation generation for Chinese voice search

机译:通过自动缩略词生成来扩展中文语音搜索的词汇量

获取原文
获取原文并翻译 | 示例
       

摘要

Long organization names are often abbreviated in spoken Chinese, and abbreviated utterances cannot be recognized correctly if the abbreviations are not included in the recognition vocabulary. Therefore, it is very important to automatically generate and add abbreviations for organization names to the vocabulary. Generation of Chinese abbreviations is much more complex than English abbreviations which are mostly acronyms and truncations. In this paper, we propose a new hybrid method for automatically generating Chinese abbreviations and we perform vocabulary expansion using output of the abbreviation model for voice search. In our abbreviation modeling, we treat the abbreviation generation problem as a tagging problem and use conditional random fields (CRF) as the tagging tool, the output of which is then re-ranked by a length model and web information. In the vocabulary expansion, considering the multiple abbreviation phenomenon and limited coverage of the top-1 abbreviation candidate, we add top-10 candidates into the vocabulary. In our experiments, for the abbreviation modeling, we achieved a top-10 coverage of 88.3% with the proposed method. For the voice search using abbreviated utterances, we improved the full-name search accuracy from 16.9% to 79.2% by incorporating the top-10 abbreviation candidates to the vocabulary.
机译:长组织名称通常用中文缩写,如果识别词汇中未包含缩写,则无法正确识别缩写话语。因此,自动生成组织名称缩写并将其添加到词汇表非常重要。中文缩写的产生要比英文缩写(多数为缩写和截断)复杂得多。在本文中,我们提出了一种新的自动生成中文缩写的混合方法,并使用缩写模型的输出进行语音搜索来扩展词汇。在我们的缩写建模中,我们将缩写生成问题视为标记问题,并使用条件随机字段(CRF)作为标记工具,然后使用长度模型和Web信息对其输出进行重新排名。在词汇扩展中,考虑到首字母缩写词的多重缩写现象和覆盖范围有限,我们将前十个候选词添加到词汇表中。在我们的实验中,对于缩写建模,我们使用提出的方法实现了前10位覆盖率达88.3%。对于使用缩写语音的语音搜索,我们通过将前10个缩写候选词合并到词汇表中,将全名搜索准确性从16.9%提高到79.2%。

著录项

  • 来源
    《Computer speech and language》 |2012年第5期|p.321-335|共15页
  • 作者单位

    Department of Computer Science, Tokyo Institute of Technology, 2-I2-I-W8-E60I, Ookayama, Meguro-ku, Tokyo 152-8552, Japan;

    Department of Computer Science, Tokyo Institute of Technology, 2-I2-I-W8-E60I, Ookayama, Meguro-ku, Tokyo 152-8552, Japan;

    Department of Computer Science, Tokyo Institute of Technology, 2-I2-I-W8-E60I, Ookayama, Meguro-ku, Tokyo 152-8552, Japan;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    automatic abbreviation generation; vocabulary expansion; voice search;

    机译:自动缩写生成;词汇扩展;声音搜索;
  • 入库时间 2022-08-18 02:11:38

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号