首页> 外文OA文献 >A comparison of grapheme and phoneme-based units for Spanish spoken term detection
【2h】

A comparison of grapheme and phoneme-based units for Spanish spoken term detection

机译:基于字素和音素的单位在西班牙语口语检测中的比较

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The ever-increasing volume of audio data available online through the world wide web means that automatic methods for indexing and search are becoming essential. Hidden Markov model (HMM) keyword spotting and lattice search techniques are the two most common approaches used by such systems. In keyword spotting, models or templates are defined for each search term prior to accessing the speech and used to find matches. Lattice search (referred to as spoken term detection), uses a pre-indexing of speech data in terms of word or sub-word units, which can then quickly be searched for arbitrary terms without referring to the original audio. In both cases, the search term can be modelled in terms of sub-word units, typically phonemes. For in-vocabulary words (i.e. words that appear in the pronunciation dictionary), the letter-to-sound conversion systems are accepted to work well. However, for out-of-vocabulary (OOV) search terms, letter-to-sound conversion must be used to generate a pronunciation for the search term. This is usually a hard decision (i.e. not probabilistic and with no possibility of backtracking), and errors introduced at this step are difficult to recover from. We therefore propose the direct use of graphemes (i.e., letter-based sub-word units) for acoustic modelling. This is expected to work particularly well in languages such as Spanish, where despite the letter-to-sound mapping being very regular, the correspondence is not one-to-one, and there will be benefits from avoiding hard decisions at early stages of processing. In this article, we compare three approaches for Spanish keyword spotting or spoken term detection, and within each of these we compare acoustic modelling based on phone and grapheme units. Experiments were performed using the Spanish geographical-domain Albayzin corpus. Results achieved in the two approaches proposed for spoken term detection show us that trigrapheme units for acoustic modelling match or exceed the performance of phone-based acoustic models. In the method proposed for keyword spotting, the results achieved with each acoustic model are very similar.
机译:通过互联网在线获取的音频数据的数量不断增加,这意味着自动建立索引和进行搜索的方法变得至关重要。隐马尔可夫模型(HMM)关键字识别和点阵搜索技术是此类系统最常用的两种方法。在关键词发现中,在访问语音之前为每个搜索词定义模型或模板,并用于查找匹配项。格搜索(称为口语检测)使用以词或子词为单位的语音数据预索引,然后可以在不参考原始音频的情况下快速搜索任意术语。在两种情况下,都可以根据子词单位(通常是音素)对搜索词进行建模。对于词汇中的单词(即出现在发音词典中的单词),字母到声音的转换系统可以正常工作。但是,对于词汇外(OOV)搜索词,必须使用字母到声音的转换来生成搜索词的发音。这通常是一个艰难的决定(即,不是概率性的,也没有回溯的可能性),并且在此步骤中引入的错误很难恢复。因此,我们建议将字素(即基于字母的子单词单元)直接用于声学建模。预期这在西班牙语等语言中会特别有效,尽管字母到声音的映射非常规则,但对应关系不是一对一的,避免在处理的早期阶段做出艰难的决定会带来好处。 。在本文中,我们比较了用于西班牙语关键词发现或语音术语检测的三种方法,并且在每种方法中,我们都比较了基于电话和字素单元的声学建模。实验是使用西班牙地理域的阿尔拜辛语料库进行的。提出的两种用于语音术语检测的方法所取得的结果表明,用于声学建模的三音素单元达到或超过了基于电话的声学模型的性能。在提出的关键字发现方法中,每个声学模型获得的结果非常相似。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号