首页> 外文会议>Annual Conference of the International Speech Communication Association >Pair-wise Distance Metric Learning of Neural Network Model for Spoken Language Identification
【24h】

Pair-wise Distance Metric Learning of Neural Network Model for Spoken Language Identification

机译:交叉语言识别神经网络模型的一对距离度量学习

获取原文

摘要

The i-vector representation and modeling technique has been successfully applied in spoken language identification (SLI). In modeling, a discriminative transform or classifier must be applied to emphasize variations correlated to language identity since the i-vector representation encodes most of the acoustic variations (e.g., speaker variation, transmission channel variation, etc.). Due to the strong nonlinear discriminative power of neural network (NN) modeling (including its deep form DNN), the NN has been directly used to learn the mapping function between the i-vector representation and language identity labels. In most studies, only the point-wise feature-label information is feeded to NN for parameter learning which may result in model overfitting, particularly when with limited training data. In this study, we propose to integrate pair-wise distance metric learning in NN parameter optimization. In the representation space of nonlinear transforms of hidden layers, a distance metric learning is explicitly designed for minimizing the pair-wise intra-class variation and maximizing the inter-class variation. With the distance metric as a constraint in the point-wise learning, the i-vectors are transformed to a new feature space which are much more discriminative for samples belonging to different languages while are much more similar for samples belonging to the same language. We tested the algorithm on a SLI task, encouraging results were obtained with more than 20% relative improvement on identification error rate.
机译:I - Vector表示和建模技术已成功应用于口语识别(SLI)。在建模中,必须应用鉴别变换或分类器来强调与语言标识相关的变化,因为I - 矢量表示编码大多数声学变化(例如,扬声器变化,传输信道变化​​等)。由于神经网络的强烈非线性鉴别力(NN)建模(包括其深形式DNN),NN直接用于学习I载体表示和语言身份标签之间的映射函数。在大多数研究中,只有点亮特征标签信息被馈送到NN的参数学习,这可能导致模型过度拟合,特别是在有限的训练数据时。在本研究中,我们建议在NN参数优化中集成配对距离度量学习。在隐藏层的非线性变换的表示空间中,明确地设计了一种距离度量学习,用于最小化对帧内变化并最大化级别的变化。随着距离度量作为逐点学习的约束,在I-矢量被转换到一个新的功能空间,同时是属于同一种语言的样本更类似于它们是属于不同语言的样本更多的歧视。我们在SLI任务上测试了算法,获得了令人鼓舞的结果,在识别误差率的相对改善20%以上获得了20%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号