首页> 外文期刊>Computer speech and language >Classification of aspirated and unaspirated sounds in speech using excitation and signal level information
【24h】

Classification of aspirated and unaspirated sounds in speech using excitation and signal level information

机译:使用激励和信号电平信息对语音中的吸气和非吸气声音进行分类

获取原文
获取原文并翻译 | 示例

摘要

In this work, consonant aspiration and unaspiration phenomena are studied. It is known that, pronunciation of aspiration and unaspiration is characterized by the 'puff of air' released at the place of constriction in the vocal tract also known as burst. Here, properties of the vowel immediately after the burst are studied for characterization of the burst. Excitation source signal estimated from speech as low pass filtered linear prediction residual signal is used for the task. The signal characteristics of parameters such as glottal pulse, duration of open, closed & return phases; slope of open, & return phases; duration of burst; ratio of highest and lowest frame wise energies of signal and voice onset point are explored as features to characterize aspiration and unaspiration. Three datasets namely TIMIT, HIT Hyderabad Mara-thi and HIT Hyderabad Hindi (IIIT-H Indic Speech Databases) are used to verify the proposed approach. Random forest, support vector machine and deep feed forward neural networks (DFFNNs) are used as classifiers to test the effectiveness of the features used for the task. Optimal features are selected for the classification using correlation based feature selection (CFS). From the results, it is observed that the proposed features are efficient in classifying the aspirated and unaspirated consonants. Performance of the proposed features in recognition of aspirated and unaspirated phoneme is also evaluated. IIIT Hyderabad Marathi is considered for the analysis. It is observed that the performance of recognition of aspirated and unaspirated sounds using proposed features is improved in comparison with the MFCCs based phoneme recognition system.
机译:在这项工作中,研究了辅音吸入和不吸入现象。众所周知,吸气和不吸气的发音的特征是在声带的收缩部位释放的“吹气”,也称为爆裂。在此,研究了突发之后元音的特性,以表征突发。从语音中估计的激励源信号作为低通滤波线性预测残差信号用于该任务。参数的信号特征,例如声门脉冲,打开,关闭和返回阶段的持续时间;开放阶段和返回阶段的斜率;爆发持续时间;探究信号的最高和最低能量与语音起始点之比作为表征吸引和不吸引的特征。 TIMIT,HIT海得拉巴Mara-thi和HIT海得拉巴印地语(IIIT-H印度语音数据库)这三个数据集用于验证该方法。随机森林,支持向量机和深度前馈神经网络(DFFNN)被用作分类器,以测试用于该任务的功能的有效性。使用基于相关的特征选择(CFS)为分类选择最佳特征。从结果可以看出,所提出的特征在分类吸气和非吸气辅音方面是有效的。还评估了拟议功能在识别吸气和非吸气音素方面的性能。考虑使用IIIT海德拉巴马拉地语进行分析。可以观察到,与基于MFCC的音素识别系统相比,使用建议的功能识别吸气和非吸气声音的性能得到了改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号