首页> 外文会议>Conference on Multimedia Information Processing and Retrieval >Securing Voice-Driven Interfaces Against Fake (Cloned) Audio Attacks
【24h】

Securing Voice-Driven Interfaces Against Fake (Cloned) Audio Attacks

机译:保护语音驱动接口以防伪(克隆)音频攻击

获取原文

摘要

Voice cloning technologies have found applications in a variety of areas ranging from personalized speech interfaces to advertisement, robotics, and so on. Existing voice cloning systems are capable of learning speaker characteristics and use trained models to synthesize a person's voice from only a few audio samples. Advances in cloned speech generation technologies are capable of generating perceptually indistinguishable speech from a bona-fide speech. These advances pose new security and privacy threats to voice-driven interfaces and speech-based access control systems. The state-of-the-art speech synthesis technologies use trained or tuned generative models for cloned speech generation. Trained generative models rely on linear operations, learned weights, and excitation source for cloned speech synthesis. These systems leave characteristic artifacts in the synthesized speech. Higher-order spectral analysis is used to capture differentiating attributes between bona-fide and cloned audios. Specifically, quadrature phase coupling (QPC) in the estimated bicoherence, Gaussianity test statistics, and linearity test statistics are used to capture generative model artifacts. Performance of the proposed method is evaluated on cloned audios generated using speaker adaptation-and speaker encoding-based approaches. Experimental results for a dataset consisting of 126 cloned speech and 8 bona-fide speech samples indicate that the proposed method is capable of detecting bona-fide and cloned audios with close to a perfect detection rate.
机译:语音克隆技术已在从个性化语音界面到广告,机器人等的各个领域中找到了应用。现有的语音克隆系统能够学习说话者的特征,并使用经过训练的模型仅从少数音频样本中合成人的语音。克隆语音生成技术的进步能够从真正的语音中生成在听觉上无法区分的语音。这些进步对语音驱动的界面和基于语音的访问控制系统构成了新的安全和隐私威胁。最新的语音合成技术使用经过训练或调整的生成模型来克隆语音生成。经过训练的生成模型依赖于线性运算,学习的权重和激发源来进行克隆语音合成。这些系统在合成语音中留下特征伪像。高阶频谱分析用于捕获真实音频和克隆音频之间的区别属性。具体而言,估计双相干性,高斯测试统计量和线性测试统计量中的正交相位耦合(QPC)用于捕获生成的模型伪像。在使用扬声器自适应和基于扬声器编码的方法生成的克隆音频上,评估了所提出方法的性能。由126个克隆语音和8个真实语音样本组成的数据集的实验结果表明,该方法能够以接近完美的检测率检测真实和克隆的音频。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号