首页> 外文期刊>Expert Systems with Application >Classification of vocal and non-vocal segments in audio clips using genetic algorithm based feature selection (GAFS)
【24h】

Classification of vocal and non-vocal segments in audio clips using genetic algorithm based feature selection (GAFS)

机译:使用基于遗传算法的特征选择(GAFS)对音频片段中的声音片段和非声音片段进行分类

获取原文
获取原文并翻译 | 示例

摘要

The technology of music information retrieval (MIR) is an emerging field that helps in tagging each portion of an audio clip. A majority of the subtasks of MIR need an application that segments vocal and non-vocal portions. In this paper, an effort has been made to segment the vocal and non-vocal regions using some novel features based on formant structure on top of standard features. The features such as Mel-frequency cepstral coefficients (MFCCs), linear prediction cepstral coefficients (LPCCs), frequency domain linear prediction (FDLP) values, statistical values of pitch, jitter, shimmer, formant attack slope (FAS), formant heights from base-to-peak (FH1), peak-to-base (FH2), formant angle values at peak (FM), valley (FA2), and F5 have been considered. The classifiers such as artificial neural networks (ANN), support vector machines (SVM), and random forest (RF) have been considered for a comparative study as they are powerful enough to discover huge non-linear patterns. The concept of genetic algorithms with the support of neural networks has been used to select the relevant features rather considering all dimensions, named as a genetic algorithm based feature selection (GAFS). an accuracy of 89.23% before windowing and 95.16% after windowing is obtained with the optimal feature vector of length 32 using artificial neural networks. The system developed is capable of detecting singing voice segments with an accuracy of 98%. (C) 2018 Elsevier Ltd. All rights reserved.
机译:音乐信息检索(MIR)技术是一个新兴领域,可帮助标记音频剪辑的每个部分。 MIR的大部分子任务都需要一个可分割语音和非语音部分的应用程序。在本文中,基于标准特征之上的共振峰结构,人们尝试使用一些新颖的特征来分割声音和非声音区域。诸如梅尔频率倒谱系数(MFCC),线性预测倒谱系数(LPCC),频域线性预测(FDLP)值,音调,抖动,微光,共振峰攻击斜率(FAS)的统计值,共振峰离底的高度考虑了峰到峰(FH1),峰到峰(FH2),峰(FM),峰谷(FA2)和F5的共振峰角度值。诸如人工神经网络(ANN),支持向量机(SVM)和随机森林(RF)之类的分类器已被考虑用于比较研究,因为它们的功能足以发现巨大的非线性模式。具有神经网络支持的遗传算法的概念已被用来选择相关特征,而不是考虑所有维度,被称为基于遗传算法的特征选择(GAFS)。使用人工神经网络使用长度为32的最佳特征向量获得加窗前的89.23%和加窗后95.16%的精度。开发的系统能够以98%的精度检测歌声片段。 (C)2018 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号