首页> 外文期刊>IEEE transactions on audio, speech and language processing >Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria
【24h】

Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria

机译:具有时间连续性和稀疏性准则的非负矩阵分解非单声道声源

获取原文
获取原文并翻译 | 示例

摘要

An unsupervised learning algorithm for the separation of sound sources in one-channel music signals is presented. The algorithm is based on factorizing the magnitude spectrogram of an input signal into a sum of components, each of which has a fixed magnitude spectrum and a time-varying gain. Each sound source, in turn, is modeled as a sum of one or more components. The parameters of the components are estimated by minimizing the reconstruction error between the input spectrogram and the model, while restricting the component spectrograms to be nonnegative and favoring components whose gains are slowly varying and sparse. Temporal continuity is favored by using a cost term which is the sum of squared differences between the gains in adjacent frames, and sparseness is favored by penalizing nonzero gains. The proposed iterative estimation algorithm is initialized with random values, and the gains and the spectra are then alternatively updated using multiplicative update rules until the values converge. Simulation experiments were carried out using generated mixtures of pitched musical instrument samples and drum sounds. The performance of the proposed method was compared with independent subspace analysis and basic nonnegative matrix factorization, which are based on the same linear model. According to these simulations, the proposed method enables a better separation quality than the previous algorithms. Especially, the temporal continuity criterion improved the detection of pitched musical sounds. The sparseness criterion did not produce significant improvements
机译:提出了一种在单声道音乐信号中分离声源的无监督学习算法。该算法基于将输入信号的幅度谱图分解为多个分量之和,每个分量具有固定的幅度谱和随时间变化的增益。每个声音源依次建模为一个或多个分量的总和。通过最小化输入频谱图和模型之间的重构误差来估计组件的参数,同时将组件频谱图限制为非负数,并偏爱增益缓慢变化且稀疏的组件。通过使用成本项,有利于时间连续性,成本项是相邻帧中增益之间平方差的总和,而惩罚非零增益则有利于稀疏性。所提出的迭代估计算法使用随机值初始化,然后使用乘法更新规则交替更新增益和频谱,直到这些值收敛为止。使用产生的音高乐器样本和鼓声的混合物进行模拟实验。将该方法的性能与基于相同线性模型的独立子空间分析和基本非负矩阵分解进行了比较。根据这些模拟,所提出的方法比以前的算法具有更好的分离质量。特别地,时间连续性标准改善了音高音调的检测。稀疏性标准未产生重大改进

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号