首页> 中文期刊>广东工业大学学报 >一种增量式学习的语音字典构造方法

一种增量式学习的语音字典构造方法

     

摘要

爆炸式增长的语音数据为存储与传输带来极大困难, 现有方法难以实时应对海量语音频域数据. 因此本文提出一种增量式学习的语音字典构造方法, 该方法先将语音时域信号经短时傅里叶变换处理后转换为各窗频谱幅值, 再将高维空间向量投影到低维空间, 并以字典中的少数基向量线性拟合当前窗向量. 进而通过存储基向量的标识和拟合系数完成对当前窗向量的存储, 把无法拟合的窗向量经处理后加入字典, 实现增量式学习. 解压过程依据用户请求将字典中指定条目经线性拟合实现. 实验结果表明, 本方法能大幅度压缩语音频谱包络, 适用于受带宽限制下实时高采样率的流式语音数据, 与同类算法相比, 在保证还原质量的情况下, 能对信号的存储空间以及传输带宽进行大幅度的压缩.%The explosive growth of audio streams brings difficulties in storage and transmission; however, many methods could not give high compression ratio while keeping the quality. In order to solve this problem, the proposed method compresses amplitude spectrum of voice by constructing a dynamic sparse voice dictionary based on incremental learning. It calculates amplitude envelopes spectrums via Short-Time Fourier Transform (STFT) firstly, and then it uses a dictionary to fit each envelope by projecting high dimensional vectors to several 2D planes. In addition, it minimizes the number of dictionary items and therefore can store the parameters of linear interpolation instead of spectrums. Otherwise, if the fitting step above fails, it will store this window of spectrum directly. By using dictionary and parameters of linear interpolation, it can reconstruct the spectrum efficiently in decompressing process. The results of experiments show that comparing with other methods, the proposed method gives high compression ratio as well as better accuracy in decompressing, and adapt to live voice stream encoding with high sampling rate.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号