首页> 外文会议>Language and Technology Conference >Intelligent Speech Features Mining for Robust Synthesis System Evaluation
【24h】

Intelligent Speech Features Mining for Robust Synthesis System Evaluation

机译:智能语音功能挖掘鲁棒合成系统评估

获取原文

摘要

Speech synthesis evaluation involves the analytical description of useful features, sufficient to assess the performance of a speech synthesis system. Its primary focus is to determine the degree of semblance of synthetic voice to a natural or human voice. The task of evaluation is usually driven by two methods: the subjective and objective methods, which have indeed become a regular standard for evaluating voice quality, but are mostly challenged by high speech variability as well as human discernment errors. Machine learning (ML) techniques have proven to be successful in the determination and enhancement of speech quality. Hence, this contribution utilizes both supervised and unsupervised ML tools to recognize and classify speech quality classes. Data were collected from a listening test (experiment) and the speech quality assessed by domain experts for naturalness, intelligibility, comprehensibility, as well as, tone, vowel and consonant correctness. During the pre-processing stage, a Principal Component Analysis (PCA) identified 4 principal components (intelligibility, naturalness, comprehensibility and tone) - accounting for 76.79% variability in the dataset. An unsupervised visualization using self organizing map (SOM), then discovered five distinct target clusters with high densities of instances, and showed modest correlation between significant input factors. A Pattern recognition using deep neural network (DNN), produced a confusion matrix with an overall performance accuracy of 93.1%, thus signifying an excellent classification system.
机译:语音合成评估涉及有用特征的分析描述,足以评估语音合成系统的性能。其主要焦点是确定合成声音的相似程度,自然或人类的声音。评估的任务通常由两种方法驱动:主观和客观方法确实成为评估语音质量的常规标准,但主要受到高音变异性以及人类辨别错误的挑战。已证明机器学习(ML)技术在言语质量的决心和提高方面取得了成功。因此,这种贡献利用监督和无人监督的ML工具来识别和分类语音质量等级。从聆听测试(实验)收集数据以及由域专家评估的语音质量,用于自然,可懂度,可理解性,以及音调,元音和辅音正确性。在预处理阶段期间,主成分分析(PCA)确定了4个主要成分(可懂度,自然,理解性和音调) - 占数据集中的可变性76.79%。使用自组织地图(SOM)的无监督可视化,然后发现了五个具有高密度的不同的目标集群,并且在显着输入因素之间显示了适度的相关性。使用深神经网络(DNN)的模式识别,产生了93.1%的整体性能精度的混淆矩阵,因此表示出色的分类系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号