...
首页> 外文期刊>IEEE Transactions on Acoustics, Speech, and Signal Processing >Unsupervised speaker adaptation based on hierarchical spectral clustering
【24h】

Unsupervised speaker adaptation based on hierarchical spectral clustering

机译:基于分层频谱聚类的无监督说话人自适应

获取原文
获取原文并翻译 | 示例
           

摘要

The author proposes an automatic speaker adaptation algorithm for speech recognition, in which a small amount of training material of unspecified text can be used. The algorithm is easily applied to vector-quantization- (VQ) speech recognition systems consisting of a VQ codebook and a word dictionary in which each word is represented as a sequence of codebook entries. In the adaptation algorithm, the VQ codebook is modified for each new speaker, whereas the word dictionary is universally used for all speakers. The important feature of this algorithm is that a set of spectra in training frames and the codebook entries are clustered hierarchically. Based on the vectors representing deviation between centroids of the training frame clusters and the corresponding codebook clusters, adaptation is performed hierarchically from small to large numbers of clusters. The spectral resolution of the adaptation process is improved accordingly. Results of recognition experiments using utterances of 100 Japanese city names show that adaptation reduces the mean word recognition error rate from 4.9 to 2.9%. Since the error rate for speaker-dependent recognition is 2.2%, the adaptation method is highly effective.
机译:作者提出了一种用于语音识别的自动说话人自适应算法,其中可以使用少量的未指定文本的训练材料。该算法可轻松应用于由VQ码本和单词词典组成的矢量量化(VQ)语音识别系统,其中每个单词都表示为一系列码本条目。在自适应算法中,为每个新说话者修改了VQ码本,而单词词典普遍用于所有说话者。该算法的重要特征是训练帧中的一组频谱和码本条目是按层次结构聚类的。基于表示训练帧簇的质心和相应的码本簇之间的偏差的向量,从小簇到大簇进行分层自适应。适应过程的光谱分辨率相应提高。使用100个日语城市名称的语音进行识别实验的结果表明,自适应将平均单词识别错误率从4.9%降低到2.9%。由于用于说话人相关识别的错误率是2.2%,因此自适应方法非常有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号