首页> 外文期刊>Computer speech and language >Directly data-derived articulatory gesture-like representations retain discriminatory information about phone categories
【24h】

Directly data-derived articulatory gesture-like representations retain discriminatory information about phone categories

机译:直接来自数据的发音类似手势的表示保留有关电话类别的歧视性信息

获取原文
获取原文并翻译 | 示例

摘要

How the speech production and perception systems evolved in humans still remains a mystery today. Previous research suggests that human auditory systems are able, and have possibly evolved, to preserve maximal information about the speaker's articulatory gestures. This paper attempts an initial step toward answering the complementary question of whether speakers' articulatory mechanisms have also evolved to produce sounds that can be optimally discriminated by the listener's auditory system. To this end we explicitly model, using computational methods, the extent to which derived representations of "primitive movements" of speech articulation can be used to discriminate between broad phone categories. We extract interpretable spatio-temporal primitive movements as recurring patterns in a data matrix of human speech articulation, i.e., representing the trajectories of vocal tract articulators over time. To this end, we propose a weakly-supervised learning method that attempts to find a part-based representation of the data in terms of recurring basis trajectory units (or primitives) and their corresponding activations over time. For each phone interval, we then derive a feature representation that captures the co-occurrences between the activations of the various bases over different time-lags. We show that this feature, derived entirely from activations of these primitive movements, is able to achieve a greater discrimination relative to using conventional features on an interval-based phone classification task. We discuss the implications of these findings in furthering our understanding of speech signal representations and the links between speech production and perception systems.
机译:当今,语音产生和感知系统如何在人类中进化仍然是一个谜。先前的研究表明,人类的听觉系统能够并且已经进化出能够保留有关说话人的发音姿态的最大信息。本文尝试着迈出第一步,以回答以下补充问题:说话者的发音机制是否也已演化为产生可以由听众的听觉系统最佳地区分的声音。为此,我们使用计算方法明确地建模,语音表达的“原始运动”的派生表示可用于区分广泛的电话类别的程度。我们提取可解释的时空原始运动作为人类语音发音数据矩阵中的重复模式,即随着时间的推移表达声道发音器的轨迹。为此,我们提出了一种弱监督学习方法,该方法尝试根据循环基础轨迹单位(或基元)及其随时间的相应激活来查找数据的基于部分的表示形式。对于每个电话间隔,我们然后导出一个特征表示,该特征表示捕获在不同时滞下各个基准的激活之间的同时发生。我们显示,此功能完全源自这些原始运动的激活,相对于在基于间隔的电话分类任务中使用常规功能而言,能够实现更大的区分度。我们讨论了这些发现的含义,以进一步加深我们对语音信号表示的理解以及语音产生与感知系统之间的联系。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号