Articulatory Gesture Rich Representation Learning of Phonological Units in Low Resource Settings

机译：明确姿态富裕的代表学习音学单位在低资源设置中

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recent literature presents evidence that both linguistic (phonemic) and non linguistic (speaker identity, emotional content) information resides at a lower dimensional manifold embedded richly inside the higher-dimensional spectral features like MFCC and PLP. Linguistic or phonetic units of speech can be broken down to a legal inventory of articulatory gestures shared across several phonemes based on their manner of articulation. We intend to discover a subspace rich in gestural information of speech and captures the invariance of similar gestures. In this paper, we investigate unsupervised techniques best suited for learning such a subspace. Main contribution of the paper is an approach to learn gesture-rich representation of speech automatically from data in completely unsupervised manner. This study compares the representations obtained through convolutional autoencoder (ConvAE) and standard unsupervised dimensionality reduction techniques such as manifold learning and Principal Component Analysis (PCA) through the task of phoneme classification. Manifold learning techniques such as Locally Linear Embedding (LLE), Isomap and Laplacian Eigenmaps are evaluated in this study. The representations which best separate different gestures are suitable for discovering subword units in case of low or zero resource speech conditions. Further, we evaluate the representation using Zero Resource Speech Challenge's ABX discriminability measure. Results indicate that representation obtained through ConvAE and Isomap out-perform baseline MFCC features in the task of phoneme classification as well as ABX measure and induce separation between sounds composed of different set of gestures. We further cluster the representations using Dirichlet Process Gaussian Mixture Model (DPGMM) to automatically learn the cluster distribution of data and show that these clusters correspond to groups of similar manner of articulation. DPGMM distribution is used as apriori to obtain correspondence terms for robust ConvAE training.

机译：最近的文献呈现了语言（音素）和非语言（扬声器身份，情绪内容）信息的证据依赖于嵌入的较高歧管内部的高尺寸歧管，如MFCC和PLP等高尺寸谱特征。语言或语音单位可以根据其铰接方式分解为横跨几个音素共享的明晰度手势的法律库存。我们打算发现致辞中丰富的子空间，并捕捉了类似手势的不变性。在本文中，我们调查了最适合学习此类子空间的无监督技术。本文的主要贡献是一种从完全无人监督的方式自动学习富有言语的致辞表达的方法。本研究将通过卷积性AutoEncoder（Concae）和标准无监督维度减少技术（如歧管学习和主成分分析（PCA））的标准无监督维度减少技术进行比较。在本研究中评估了诸如局部线性嵌入（LLE），ISOMAP和LAPPLACIAN eIGenmaps的歧管学习技术。在低或零资源语音条件的情况下，最佳分隔不同手势的表示是适用于发现子字单元。此外，我们使用零资源语音挑战的ABX可辨别性测量评估表示。结果表明，通过Convae和ISOMAP获得的表示，在音素分类的任务中通过Convae和ISOMAP提供基线MFCC特征以及ABX测量并在由不同手势组成的声音之间引起分离。我们进一步使用Dirichlet Process Gaussian混合模型（DPGMM）的表示来自动学习数据的集群分布，并表明这些群集对应于类似的铰接方式的组。 DPGMM分布用作APRIORI，以获得强大的CONVEAE培训的通信条款。

著录项

来源
《International Conference on Statistical Language and Speech Processing》|2016年|144p|共16页
会议地点
作者
Brij Mohan Lal Srivastava; Manish Shrivastava;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN912-53;
关键词
Neural representation of speech and language; Manifold learning; Zero resource; Articulatory gestures;

机译：言语和语言的神经形式;歧管学习;零资源;关节姿态;
入库时间 2022-08-20 23:00:28

相似文献

外文文献
中文文献
专利

1. Articulatory control of phonological vowel length contrasts: Kinematic analysis of labial gestures [J] . Ingo Hertrich, Hermann Ackermann The Journal of the Acoustical Society of America . 1997,第1期

机译：语音元音长度的发音控制对比：唇形手势的运动学分析
2. Differential Representation of Articulatory Gestures and Phonemes in Precentral and Inferior Frontal Gyri [J] . Mugler Emily M., Tate Matthew C., Livescu Karen, The Journal of Neuroscience: The Official Journal of the Society for Neuroscience . 2018,第46期

机译：前术和较差前吉尔蒂的明晰度手势和音素的差异表示
3. Directly data-derived articulatory gesture-like representations retain discriminatory information about phone categories [J] . Vikram Ramanarayanan, Maarten Van Segbroeck, Shrikanth S. Narayanan Computer speech and language . 2016,第Mara期

机译：直接来自数据的发音类似手势的表示保留有关电话类别的歧视性信息
4. Articulatory Gesture Rich Representation Learning of Phonological Units in Low Resource Settings [C] . Brij Mohan Lai Srivastava, Manish Shrivastava International conference on statistical language and speech processing . 2016

机译：资源匮乏地区中语音单位的语音发音丰富表示学习
5. Phonological Awareness in Foreign Language Learning: Japanese Learners' Articulatory Accuracy of Spanish Liquids [D] . Campos Tejero Oscar Miguel 2019

机译：外语学习中的语音意识：日本学习者对西班牙液体的发音准确度
6. Differential Representation of Articulatory Gestures and Phonemes in Precentral and Inferior Frontal Gyri [O] . Emily M. Mugler, Matthew C. Tate, Karen Livescu, 2018

机译：中央和下额额回中关节发音手势和音素的差异表示
7. Articulatory gestures as phonological units [O] . Catherine P. Browman, Louis Goldstein 1989

机译：发音手势作为语音单位

Articulatory Gesture Rich Representation Learning of Phonological Units in Low Resource Settings

摘要

著录项

相似文献

相关主题

期刊订阅