COMPARISON OF MPEG-7 AUDIO SPECTRUM PROJECTION FEATURES AND MFCC APPLIED TO SPEAKER RECOGNITION, SOUND CLASSIFICATION AND AUDIO SEGMENTATION

机译：MPEG-7音频频谱投影功能和MFCC的比较应用于扬声器识别，声音分类和音频分割

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Our purpose is to evaluate the MPEG-7 Audio Spectrum Projection (ASP) features for general sound recognition performance vs. well established MFCC. The recognition tasks of interest are speaker recognition, sound classification, and segmentation of audio using sound/speaker identification. For the sound classification we use three approaches: the direct approach, the hierarchical approach without hints, and the hierarchical approach with hints. For audio segmentation the MPEG-7 ASP features and MFCCs are used to train hidden Markov models (HMM) for individual speakers and sounds. The trained sound/speaker models are then used to segment conversational speech involving a given subset of people in panel discussion television programs. Results show that MFCC approach yields sound/speaker recognition rate superior to MPEG-7 implementations.

机译：我们的目的是评估MPEG-7音频频谱投影（ASP）功能，用于通用声音识别性能与已建立的MFCC。感兴趣的识别任务是使用声音/扬声器识别的音频识别，声音分类和音频分割。对于声音分类，我们使用三种方法：直接方法，没有提示的分层方法，以及具有提示的分层方法。对于音频分割，MPEG-7 ASP功能和MFCC用于培训针对各个扬声器和声音的隐马尔可夫模型（HMM）。然后，培训的声音/扬声器模型用于涉及面板讨论电视节目中的给定子集的会话语音。结果表明，MFCC方法会产生优于MPEG-7实现的声音/扬声器识别率。

著录项

来源
《IEEE International Conference on Acoustics, Speech, and Signal Processing》|2004年||共4页
会议地点
作者
Hyoung-Gook Kim; Thomas Sikora;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN912-53;
关键词

相似文献

外文文献
中文文献
专利

1. Fusing MFCC and LPC Features Using 1D Triplet CNN for Speaker Recognition in Severely Degraded Audio Signals [J] . Chowdhury Anurag, Ross Arun IEEE transactions on information forensics and security . 2020,第期

机译：使用一维三重态CNN融合MFCC和LPC功能，以在严重降级的音频信号中识别扬声器
2. Query by Example of Speaker Audio Signals using Power Spectrum and MFCCs [J] . Pafan Doungpaisan, Anirach Mingkhwan International Journal of Electrical and Computer Engineering . 2017,第6期

机译：使用功率谱和MFCC通过扬声器音频信号示例查询
3. SPEAKER RECOGNITION USING AUDIO SPECTRUM PROJECTION AND VECTOR QUANTIZATION [J] . Bikram Kar, Avishek Dey International journal of simulation: systems, science and technology . 2018,第4aaPagea1期

机译：使用音频频谱投影和矢量量化扬声器识别
4. COMPARISON OF MPEG-7 AUDIO SPECTRUM PROJECTION FEATURES AND MFCC APPLIED TO SPEAKER RECOGNITION, SOUND CLASSIFICATION AND AUDIO SEGMENTATION [C] . Hyoung-Gook Kim, Thomas Sikora IEEE International Conference on Acoustics, Speech, and Signal Processing . 2004

机译：MPEG-7音频频谱投影功能和MFCC的比较应用于扬声器识别，声音分类和音频分割
5. Robust speech processing based on microphone array, audio-visual, and frame selection for in-vehicle speech recognition and in-set speaker recognition. [D] . Zhang, Xianxian. 2005

机译：基于麦克风阵列，视听和帧选择的强大语音处理功能，可实现车载语音识别和内置说话人识别。
6. Development of Sound Field Audiometry System for Small Audiometric Booths and Comparison of Its Equivalence With Traditional System [O] . Eun Kyung Jung, Young Mi Choi, Eun Jung Kim, 2020

机译：小型测听室声场测听系统的开发及其与传统系统的等效性比较
7. AUDIO SPECTRUM PROJECTION BASED ON SEVERAL BASIS DECOMPOSITION ALGORITHMS APPLIED TO GENERAL SOUND RECOGNITION AND AUDIO SEGMENTATION [O] . Kim Hyoung-Gook, Sikora Thomas 2004

机译：基于适用于一般声音识别和音频分段的几种基础分解算法的音频频谱投影

COMPARISON OF MPEG-7 AUDIO SPECTRUM PROJECTION FEATURES AND MFCC APPLIED TO SPEAKER RECOGNITION, SOUND CLASSIFICATION AND AUDIO SEGMENTATION

摘要

著录项

相似文献

相关主题

期刊订阅