首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Improved Modeling of Cross-Decoder Phone Co-Occurrences in SVM-Based Phonotactic Language Recognition
【24h】

Improved Modeling of Cross-Decoder Phone Co-Occurrences in SVM-Based Phonotactic Language Recognition

机译:基于支持向量机的语音策略语言识别中跨解码器电话共现的改进建模

获取原文
获取原文并翻译 | 示例

摘要

Most common approaches to phonotactic language recognition deal with several independent phone decodings. These decodings are processed and scored in a fully uncoupled way, their time alignment (and the information that may be extracted from it) being completely lost. Recently, we have presented two new approaches to phonotactic language recognition which take into account time alignment information, by considering time-synchronous cross-decoder phone co-occurrences. Experiments on the 2007 NIST LRE database demonstrated that using phone co-occurrence statistics could improve the performance of baseline phonotactic recognizers. In this paper, approaches based on time-synchronous cross-decoder phone co-occurrences are further developed and evaluated with regard to a baseline SVM-based phonotactic system, by using: 1) counts of $n$-grams (up to 4-grams) of phone co-occurrences; and 2) the degree of co-occurrence of phone $n$-grams (up to 4-grams). To evaluate these approaches, a choice of open software (Brno University of Technology phone decoders, LIBLINEAR and FoCal) was used, and experiments were carried out on the 2007 NIST LRE database. The two approaches presented in this paper outperformed the baseline phonotactic system, yielding around 7% relative improvement in terms of $C_{rm LLR}$. The fusion of the baseline system with the two proposed approaches yielded 1.83% EER and $C_{rm LLR}=0.270$ (meaning 18% relative improvement), the same performance (on the same task) than state-of-the-art phonotactic systems which apply more complex models and techniques, thus supporting the use of cross-decoder dependencies for language recognition.
机译:音符语言识别的最常见方法涉及几种独立的电话解码。这些解码以完全不耦合的方式进行处理和评分,它们的时间对齐(以及可能从中提取的信息)完全丢失。最近,我们通过考虑时间同步交叉解码器电话同时出现,提出了两种新的音韵语言识别方法,这些方法考虑了时间对齐信息。在2007 NIST LRE数据库上进行的实验表明,使用电话共现统计信息可以提高基线音符识别器的性能。在本文中,通过使用以下方法,进一步开发和评估了基于时间同步交叉解码器电话共现的方法,并针对基于基线SVM的音标方法系统进行了评估:1)$ n $ -gram的计数(最多4-克的电话同时出现; 2)电话$ n $-克的共现程度(最多4克)。为了评估这些方法,使用了开放软件(布尔诺工业大学电话解码器,LIBLINEAR和FoCal)的选择,并在2007 NIST LRE数据库上进行了实验。本文介绍的两种方法的性能优于基线音位控制系统,相对于$ C_ {rm LLR} $,其相对改进率约为7%。将基线系统与两种建议的方法进行融合,可以产生1.83%的EER和$ C_ {rm LLR} = 0.270 $(意味着相对改善18%),并且与最新技术相比具有相同的性能(在相同的任务上)运用了更复杂的模型和技术的音速系统,从而支持使用交叉解码器依赖性进行语言识别。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号