首页> 外文OA文献 >Dynamic Adaptation of Time-Frequency Resolution in Spectral Analysis of Speech Signals

【2h】

Dynamic Adaptation of Time-Frequency Resolution in Spectral Analysis of Speech Signals

机译：语音信号频谱分析中时频分辨率的动态适应

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In speech parametrization the speech signal spectrum is calculated with a frame of typical length between 15 and 35 msec. This results in a uniform time-frequency resolution that does not conform well to the properties of human hearing. One of the hearing properties is nonlinear frequency resolution which can be approximated with multiresolution spectral analysis. In this thesis the continuous wavelet transform was tried as a possible approach for MFCC extraction which is currently the most common type of parametrization. The resulting spectrum, similar to the human hearing, has the frequency resolution that is better at lower frequencies and gets coarser with increasing frequency. This results in better time resolution at higher frequencies which allows spectral changes to be detected more precisely if they are (also) present at higher frequencies. Lower frequency band can be analyzed with fine frequency resolution at the same time. The comparison of success rate achieved with the same speech recognition systems did not show any advantages of wavelet transform based MFCCs over standard MFCCs. Because computing continuous wavelet transform is computationally quite intensive, discrete wavelet transform based MFCCs were also tested. The use of discrete wavelet transform resulted in a significantly decreased success rate. The use of wavelet transform does not solve the problems related to nonstationarity of speech signal, as the time-frequency resolution of its spectrum is not time dependent.ududIn this dissertation, three approaches for dynamic adapting time-frequency resolution were presented. In every approach, one has to estimate how rapidly the spectrum is changing at a given time. This estimation can be based on known facts about the structure of speech or about production of speech. In the first presented approach, the adaptive time-frequency was achieved by varying the frame length based on the phonetic structure of the speech. For each phoneme, the basic properties of spectrum are known. The spectrum of vowels and some other long phonemes is almost stationary, but spectrum of other phonemes, such as stops changes rapidly. If phonetic structure of speech is known, the time-frequency can be adapted by using appropriate frame length for each phoneme. In speech recognition, the phonetic structure is not known. Therefore, speech recognition needs to be done in two passes. Phonetic structure is unknown in the first pass and a fixed frame length is used for parametrization. In the second pass, the phonetic structure from the first pass is known, and the frame length is selected on its basis. In the second presented approach, the time-frequency resolution was adapted according to Moore’s formula, which describes human’s perception of intensity changes in speech signal. Most of intensity changes are related to sections of speech where temporal resolution is more important than frequency resolution. Larger intensity changes are related to short phonemes, such as burst release in plosives. Intensity changes are also related to phoneme transitions. Therefore, when intensity changes are high, the wideband spectrum is emphasized and when they are low narrowband spectrum is emphasized. Computing intensity changes is far less computationally intensive than determining the phonetic structure in an additional pass. The third approach is based on recognition of voiced and unvoiced speech segments. When voiced speech is produced, the vocal folds need to be closed to obstruct the airflow. Because voiced and unvoiced segments are determined by opening or closing the vocal folds, a voiced segment cannot be very short. Most of voiced phonemes are long and have almost stationary spectrum. In feature extraction longer frame was used on voiced segments and shorter frame on unvoiced segments. All of the three above-mentioned approaches to dynamic time-frequency resolution adapting were tested with the same speech recognition system with two speech databases. Several additive and two convolutive distortions were used to test the robustness. In our experiments, adapting frame length based on phonetic structure of speech proved to be too complicated. It is computationally demanding, and was only tested with the smaller speech database. The success rate was almost unchanged, and robustness decreased slightly in comparison to the original speech recognition system which uses standard MFCCs. Adapting time-frequency resolution to intensity changes resulted in increased success rate and robustness. The improvement was quite large and very consistent. Adapting the frame length according to voiced and unvoiced speech segment improved the robustness and in some experiments the success rate.ud

机译：在语音参数化中，使用典型长度在15到35毫秒之间的帧计算语音信号频谱。这会导致统一的时频分辨率，该分辨率与人的听觉特性不太吻合。听力特性之一是非线性频率分辨率，可以通过多分辨率频谱分析来近似。在本文中，尝试将连续小波变换作为MFCC提取的一种可能方法，这是目前最常见的参数化类型。所得的频谱类似于人的听力，其频率分辨率在较低频率下更好，而在频率增加时变得更粗糙。这样可以在较高的频率下获得更好的时间分辨率，如果频谱变化也以较高的频率出现，则可以更精确地检测到频谱变化。可以同时以较低的频率分辨率分析较低的频带。用相同的语音识别系统获得的成功率的比较并未显示出基于小波变换的MFCC与标准MFCC相比没有任何优势。由于计算连续小波变换的计算量很大，因此还测试了基于离散小波变换的MFCC。离散小波变换的使用导致成功率显着降低。小波变换的使用没有解决语音信号非平稳性问题，因为其频谱的时频分辨率与时间无关。在每种方法中，都必须估算频谱在给定时间变化的速度。该估计可以基于关于语音结构或关于语音产生的已知事实。在第一种提出的方法中，通过基于语音的语音结构来改变帧长度来实现自适应时频。对于每个音素，频谱的基本属性是已知的。元音和其他一些长音素的频谱几乎是固定的，但是其他音素（例如停止音）的频谱变化很快。如果已知语音的语音结构，则可以通过为每个音素使用适当的帧长来调整时间频率。在语音识别中，语音结构是未知的。因此，语音识别需要分两次进行。在第一遍中，语音结构是未知的，并且将固定的帧长度用于参数化。在第二遍中，从第一遍开始的语音结构是已知的，并且基于其长度选择帧长。在第二种方法中，时间-频率分辨率是根据摩尔的公式进行调整的，该公式描述了人类对语音信号强度变化的感知。大多数强度变化与语音部分有关，在这些部分中时间分辨率比频率分辨率更重要。较大的强度变化与短音素有关，例如爆破音中的突发释放。强度变化也与音素过渡有关。因此，当强度变化高时，宽带频谱被强调，而当强度变化低时，窄带频谱被强调。计算强度变化远不如在另一遍中确定语音结构那样计算强度大。第三种方法基于对有声和无声语音段的识别。发出带语音的语音时，需要关闭声带以阻止气流。因为浊音和清音段是通过打开或关闭声带来确定的，所以浊音段不能很短。大多数浊音音素很长，并且频谱几乎固定。在特征提取中，在有声段上使用较长的帧，在无声段上使用较短的帧。在具有两个语音数据库的相同语音识别系统中测试了上述三种动态时频分辨率自适应方法。几个加性和两个卷积失真用于测试鲁棒性。在我们的实验中，基于语音的语音结构来调整帧长度被证明太复杂了。它对计算的要求很高，并且仅在较小的语音数据库中进行了测试。与使用标准MFCC的原始语音识别系统相比，成功率几乎没有变化，并且健壮性略有下降。使时频分辨率适应强度变化可提高成功率和鲁棒性。改进很大并且非常一致。根据浊音和清音语音段调整帧长度可以提高鲁棒性，在某些实验中可以提高成功率。

著录项

作者
Štrancar Andrej;
展开▼
作者单位

展开▼
年度 2006
总页数
原文格式 PDF
正文语种 {"code":"sl","name":"Slovene","id":39}
中图分类

相似文献

外文文献
中文文献
专利

1. Extension of the Capon's spectral estimator to time-frequency analysis and to the analysis of polynomial-phase signals [J] . Mehmet Tankut Oezgen Signal processing . 2003,第3期

机译：Capon频谱估计器的扩展适用于时频分析和多项式相位信号的分析
2. A multisynchrosqueezing-based high-resolution time-frequency analysis tool for the analysis of non-stationary signals [J] . Yu Gang Journal of Sound and Vibration . 2021,第1期

机译：基于多居气相制的高分辨率时频分析工具，用于分析非静止信号
3. Speech Signals Protection Via Logo Watermarking Based On The Time-frequency Analysis [J] . Irena Orovic, Predrag Zogovic, Nikola Zaric, Annals of telecommunications . 2008,第7a8期

机译：基于时频分析的徽标水印保护语音信号
4. Dynamic spectral analysis of arterial Doppler blood flow signals using time-frequency representations [C] . Long, X., Lee, . 1997

机译：使用时频表示法对动脉多普勒血流信号进行动态频谱分析
5. Signal spectral analysis with application in speech processing. [D] . Rashidi Far, Reza. 2006

机译：信号频谱分析及其在语音处理中的应用。
6. Time-Frequency Feature Representation Using Multi-Resolution Texture Analysis and Acoustic Activity Detector for Real-Life Speech Emotion Recognition [O] . Kun-Ching Wang 2015

机译：使用多分辨率纹理分析和声活动检测器的时频特征表示用于现实生活中的语音情感识别
7. EEG Signal Discrimination using Non-linear Dynamics in the EMD Domain S. M. Shafiul Alam,S. M. Shafiul Alam,Aurangozeb, and Syed TarekShahriar Abstract—An EMD-chaos based approach is proposed todiscriminate EEG signals corresponding to healthy persons,and epileptic patients during seizure-free intervals and seizureattacks. An electroencephalogram (EEG) is first empiricallydecomposed to intrinsic mode functions (IMFs). The nonlineardynamics of these IMFs are quantified in terms of the largestLyapunov exponent (LLE) and correlation dimension (CD).This chaotic analysis in EMD domain is applied to a large groupof EEG signals corresponding to healthy persons as well asepileptic patients (both with and without seizure attacks). It isshown that the values of the obtained LLE and CD exhibitfeatures by which EEG for seizure attacks can be clearlydistinguished from other EEG signals in the EMD domain.Thus, the proposed approach may aid researchers in developingeffective techniques to predict seizure activities. Index Terms—Electroencephalogram (EEG), empiricalmode decomposition (EMD), largest Lyapunov exponent (LLE),correlation dimension (CD), epileptic seizures. The Authors are with the Electrical and Electronic EngineeringDepartment, Bangladesh University of Engineering and Technology,Dhaka-1000, Bangladesh (e-mail: imamul@eee.buet.ac.bd) PDF Cite: S. M. Shafiul Alam,S. M. Shafiul Alam,Aurangozeb, and Syed Tarek Shahriar, "EEG Signal Discrimination using Non-linear Dynamics in the EMD Domain," International Journal of Computer and Electrical Engineering vol. 4, no. 3, pp. 326-330, 2012. PREVIOUS PAPER Perception of Emotions Using Constructive Learningthrough Speech NEXT PAPER Physical Layer Impairments Aware OVPN Connection Selection Mechanisms Copyright © 2008-2013. International Association of Computer Science and Information Technology Press (IACSIT Press) [O] . S. M. Shafiul Alam, Syed TarekShahriar 2012

机译：EEG信号在EMD域S. S. Shafiul Alam，S中的非线性动力学使用非线性动力学。 M. Shafiul Alam，Aurangozeb和Syed Tarekshahriar摘要 - 基于EMD Chaos的方法，提出了对应于健康人的EEG信号，癫痫发作期间的癫痫患者和Seizureattacks。脑电图（EEG）首先被凭经上分解为内在模式功能（IMF）。这些IMF的非线性动力学在最大范围的指数（LLE）和相关尺寸（CD）方面是量化的。本域中的混沌分析应用于与健康人相对应的大型脑电图（Asepileptic患者）（两者都有癫痫发作）。因此，所获得的LLE和CD表展的价值可以从EMD领域的其他EEG信号中清晰地区分脑电图的表达展示。本拟议的方法可以帮助研究人员以预测癫痫发作的癫痫发作技术。索引术语 - 脑电图（EEG），仿真态分解（EMD），最大的Lyapunov指数（LLE），相关维度（CD），癫痫发作。作者与电气电子和电子工程公司，孟加拉国工程和技术大学，孟加拉国达卡 - 1000（电子邮件：imamul@eee.buet.ac.bd）pdf cite：s. m. shafiul Alam，s。 M. Shafiul Alam，Aurangozeb和Syed Tarek Shahriar，“EEG信号歧视在EMD领域的非线性动态，”计算机电气工程卷国际杂志。 4，不。 3，pp。326-330,2012，上一篇论文对情绪的看法，使用建设性的学习言论下一篇论文物理层障碍意识到OVPN连接选择机制版权所有©2008-2013。国际计算机科学与信息技术协会出版社（IACSIT Press）

Dynamic Adaptation of Time-Frequency Resolution in Spectral Analysis of Speech Signals

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅