首页> 外文会议> >Cepstral Domain Modification of Audio Signals for Data Embedding -Preliminary Results
【24h】

Cepstral Domain Modification of Audio Signals for Data Embedding -Preliminary Results

机译:用于数据嵌入的音频信号的倒谱域修改-初步结果

获取原文
获取原文并翻译 | 示例

摘要

A method of embedding data in an audio signal using cepstral domain modification is described. Based on successful embedding in the spectral points of perceptually masked regions in each frame of speech, first the technique was extended to embedding in the log spectral domain. This extension resulted at approximately 62 bits /s of embedding with less than 2 percent of bit error rate (BER) for a clean cover speech (from the TIMIT database), and about 2.5 percent for a noisy speech (from an air traffic controller database), when all frames - including silence and transition between voiced and unvoiced segments - were used. Bit error rate increased significantly when the log spectrum in the vicinity of a formant was modified. In the next procedure, embedding by altering the mean cepstral values of two ranges of indices was studied. Tests on both a noisy utterance and a clean utterance indicated barely noticeable perceptual change in speech quality when lower range of cepstral indices - corresponding to vocal tract region - was modified in accordance with data. With an embedding capacity of approximately 62 bits /s - using one bit per each frame regardless of frame energy or type of speech - initial results showed a BER of less than 1.5 percent for a payload capacity of 208 embedded bits using the clean cover speech. BER of less than 1.3 percent resulted for the noisy host with a capacity was 316 bits. When the cepstrum was modified in the region of excitation, BER increased to over 10 percent. With quantization causing no significant problem, the technique warrants further studies with different cepstral ranges and sizes. Pitch-synchronous cepstrum modification, for example, may be more robust to attacks. In addition, cepstrum modification in regions of speech that are perceptually masked - analogous to embedding in frequency masked regions - may yield imperceptible stego audio with low BER.
机译:描述了一种使用倒频谱域修改将数据嵌入音频信号中的方法。基于成功嵌入到每个语音帧中的感知蒙版区域的频谱点中,首先将该技术扩展到嵌入对数频谱域。这种扩展的结果是,大约62位/ s的嵌入时间(用于TIFF数据库的纯净掩盖语音)的误码率(BER)不到2%,而对于嘈杂的语音(来自空中交通管制员数据库)的误码率大约为2.5% ),则使用所有帧(包括静音和有声和无声段之间的过渡)时。当修改共振峰附近的对数谱时,误码率显着增加。在下一个过程中,研究了通过更改两个索引范围的平均倒谱值进行嵌入。对嘈杂发声和干净发声的测试表明,当根据数据修改较低的倒谱指数范围(对应于声道区域)时,语音质量几乎没有明显的知觉变化。嵌入能力约为62位/秒-每帧使用一个位,而与帧能量或语音类型无关-初始结果显示,对于使用干净覆盖语音的208个嵌入位的有效负载容量,BER小于1.5%。对于容量为316位的嘈杂主机,BER不到1.3%。当在激发区域改变倒频谱时,BER增加到10%以上。由于量化不会引起重大问题,因此该技术值得对不同倒谱范围和大小进行进一步研究。例如,音高同步倒谱修改可能对攻击更健壮。另外,在语音被感知掩盖的区域中的倒频谱修改(类似于嵌入在频率掩盖的区域中)可能会产生低BER的隐蔽的隐身音频。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号