首页> 外文会议>International Conference on Statistical Language and Speech Processing >Articulatory Gesture Rich Representation Learning of Phonological Units in Low Resource Settings
【24h】

Articulatory Gesture Rich Representation Learning of Phonological Units in Low Resource Settings

机译:明确姿态富裕的代表学习音学单位在低资源设置中

获取原文

摘要

Recent literature presents evidence that both linguistic (phonemic) and non linguistic (speaker identity, emotional content) information resides at a lower dimensional manifold embedded richly inside the higher-dimensional spectral features like MFCC and PLP. Linguistic or phonetic units of speech can be broken down to a legal inventory of articulatory gestures shared across several phonemes based on their manner of articulation. We intend to discover a subspace rich in gestural information of speech and captures the invariance of similar gestures. In this paper, we investigate unsupervised techniques best suited for learning such a subspace. Main contribution of the paper is an approach to learn gesture-rich representation of speech automatically from data in completely unsupervised manner. This study compares the representations obtained through convolutional autoencoder (ConvAE) and standard unsupervised dimensionality reduction techniques such as manifold learning and Principal Component Analysis (PCA) through the task of phoneme classification. Manifold learning techniques such as Locally Linear Embedding (LLE), Isomap and Laplacian Eigenmaps are evaluated in this study. The representations which best separate different gestures are suitable for discovering subword units in case of low or zero resource speech conditions. Further, we evaluate the representation using Zero Resource Speech Challenge's ABX discriminability measure. Results indicate that representation obtained through ConvAE and Isomap out-perform baseline MFCC features in the task of phoneme classification as well as ABX measure and induce separation between sounds composed of different set of gestures. We further cluster the representations using Dirichlet Process Gaussian Mixture Model (DPGMM) to automatically learn the cluster distribution of data and show that these clusters correspond to groups of similar manner of articulation. DPGMM distribution is used as apriori to obtain correspondence terms for robust ConvAE training.
机译:最近的文献呈现了语言(音素)和非语言(扬声器身份,情绪内容)信息的证据依赖于嵌入的较高歧管内部的高尺寸歧管,如MFCC和PLP等高尺寸谱特征。语言或语音单位可以根据其铰接方式分解为横跨几个音素共享的明晰度手势的法律库存。我们打算发现致辞中丰富的子空间,并捕捉了类似手势的不变性。在本文中,我们调查了最适合学习此类子空间的无监督技术。本文的主要贡献是一种从完全无人监督的方式自动学习富有言语的致辞表达的方法。本研究将通过卷积性AutoEncoder(Concae)和标准无监督维度减少技术(如歧管学习和主成分分析(PCA))的标准无监督维度减少技术进行比较。在本研究中评估了诸如局部线性嵌入(LLE),ISOMAP和LAPPLACIAN eIGenmaps的歧管学习技术。在低或零资源语音条件的情况下,最佳分隔不同手势的表示是适用于发现子字单元。此外,我们使用零资源语音挑战的ABX可辨别性测量评估表示。结果表明,通过Convae和ISOMAP获得的表示,在音素分类的任务中通过Convae和ISOMAP提供基线MFCC特征以及ABX测量并在由不同手势组成的声音之间引起分离。我们进一步使用Dirichlet Process Gaussian混合模型(DPGMM)的表示来自动学习数据的集群分布,并表明这些群集对应于类似的铰接方式的组。 DPGMM分布用作APRIORI,以获得强大的CONVEAE培训的通信条款。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号