首页> 外文OA文献 >Exploring the dimensionality of speech using manifold learning and dimensionality reduction methods
【2h】

Exploring the dimensionality of speech using manifold learning and dimensionality reduction methods

机译:使用流形学习和降维方法探索语音的维度

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Many previous investigations have indicated that speech data has inherent low-dimensional structure and that it may be possible to efficiently represent speech using only a small number of parameters. This view is motivated by the udfact that articulatory movement is limited by physiological constraints and thus the speech production apparatus has only limited degrees of freedom. Also, the set of sounds used in human spoken communication is only a small subset of all producible sounds. A number of dimensionality reduction methods capable of discovering such underlying structure have previously been applied to speech. However, if speech lies on a manifold nonlinearly embedded in high-dimensional space, as has been proposed in the past, classic linear dimensionality reduction methods would be unable to discover this embedding. In this dissertation a udnumber of manifold learning, also referred to as nonlinear dimensionality reduction, methods are applied to speech to explore the possibility of underlying nonlinear manifold structure.ududThis dissertation describes a number of existing manifold learning methods and details the application of these methods to high-dimensional feature representations of speech data. Representations derived from the conventional udmagnitude spectrum and less widely used phase spectrum are investigated. The manifold learning methods used in this study are locally linear embedding, Isomap, and Laplacian eigenmaps. The classic linear method, principal component udanalysis (PCA), is also applied to facilitate the comparison of linear and nonlinear methods. The resulting low-dimensional representations are analysed through visualisation, phone recognition, and speaker recognition experiments. The recognition experiments are used as a means of evaluating how much meaningful discriminatory information is contained in the low-dimensional udrepresentations produced by each method. These experiments also serve to display the potential value of these methods in speech processing applications.ududThe manifold learning methods are shown to be capable of producing meaningful lowdimensional representations of speech data suggesting speech has low-dimensional manifold structure. In general, these methods are found to outperform PCA in low dimensions, indicating that speech may lie on a manifold nonlinearly embedded in high-dimensional space. Phone classification experiments udshow that Isomap can offer improvements over standard features and PCA-transformed features. Investigation of magnitude and phase spectrum representations found both to have similar low-dimensional structure and confirm that the phase spectrum contains useful information for phone discrimination. Results indicate that combining magnitude and phase spectrum information yields improvements in phone classification tasks. A method to combine magnitude and udphase spectrum features for increased phone classification accuracy without large increases in feature dimensionality is also described.
机译:先前的许多研究表明,语音数据具有固有的低维结构,并且可能仅使用少量参数就可以有效地表示语音。该观点是由“事实”激发的,即关节运动受到生理限制,因此语音产生装置仅具有有限的自由度。而且,人类口头交流中使用的声音集只是所有可产生声音的一小部分。先前已经将许多能够发现这种底层结构的降维方法应用于语音。然而,如过去所提出的,如果语音位于非线性地嵌入高维空间的流形上,则经典的线性降维方法将无法发现这种嵌入。本文研究了多种流形学习,也称为非线性降维,将方法应用于语音,以探讨潜在的非线性流形结构的可能性。 ud ud本文描述了许多现有的流形学习方法,并详细介绍了其应用。这些方法可以用于语音数据的高维特征表示。研究了从常规振幅谱和使用较少的相位谱得到的表示。本研究中使用的流形学习方法是局部线性嵌入,Isomap和Laplacian特征图。经典的线性方法,主成分 udanalysis(PCA),也用于简化线性和非线性方法的比较。通过可视化,电话识别和说话者识别实验分析所得的低维表示。识别实验用作评估每种方法产生的低维 udrepresents中包含多少有意义的区分信息的一种手段。这些实验还有助于显示这些方法在语音处理应用中的潜在价值。 ud ud流形学习方法被证明能够产生有意义的低维语音数据表示,表明语音具有低维流形结构。通常,在低维度上发现这些方法优于PCA,这表明语音可能位于非线性嵌入高维度空间的流形上。电话分类实验 udshow表明,Isomap可以对标准功能和PCA转换后的功能进行改进。对幅度和相位频谱表示的研究都发现它们具有相似的低维结构,并证实了相位频谱包含有用的信息以进行电话识别。结果表明,结合幅度和相位频谱信息可以改善电话分类任务。还描述了一种结合幅度和同相频谱特征以提高电话分类准确度而又不大幅增加特征维数的方法。

著录项

  • 作者

    Errity Andrew;

  • 作者单位
  • 年度 2010
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号