首页> 外国专利> METHOD FOR DETECTING AN AUDIO ADVERSARIAL ATTACK WITH RESPECT TO A VOICE INPUT PROCESSED BY AN AUTOMATIC SPEECH RECOGNITION SYSTEM, CORRESPONDING DEVICE, COMPUTER PROGRAM PRODUCT AND COMPUTER-READABLE CARRIER MEDIUM

METHOD FOR DETECTING AN AUDIO ADVERSARIAL ATTACK WITH RESPECT TO A VOICE INPUT PROCESSED BY AN AUTOMATIC SPEECH RECOGNITION SYSTEM, CORRESPONDING DEVICE, COMPUTER PROGRAM PRODUCT AND COMPUTER-READABLE CARRIER MEDIUM

机译：用于检测与自动语音识别系统、相应设备、计算机程序产品和计算机可读载体介质处理的语音输入有关的音频对抗攻击的方法

页面导航

摘要
著录项
相似文献

摘要

The disclosure relates to a method and device for detecting an audio adversarial attack with respect to a voice input (VI) processed by an automatic speech recognition system (ASR). The method includes: obtaining (11) an input audio signal (IAS) associated with the voice input; obtaining (12) a transcript (T) resulting from the processing, by the automatic speech recognition system, of the input audio signal; converting (13) the transcript (T) into a synthesized audio signal (SAS); extracting (15, 15'), at a sampling time interval, at least one acoustic feature of a same type, respectively from the input audio signal and from the synthesized audio signal, delivering a first sequence of features vectors (sFV1) associated with the input audio signal and a second sequence of features vectors (sFV2) associated with the synthesized audio signal; converting (16, 16') the acoustic features of the first sequence of features vectors and the acoustic features of the second sequence of features vectors to corresponding acoustic features associated with a target reference voice (RV), respectively delivering a first sequence of converted features vectors (sCFV1) and a second sequence of converted features vectors (sCFV2); computing (17) a dynamic time warping distance (D) between the first sequence of converted features vectors and the second sequence of converted features vectors; and delivering (18) a piece of data representative of a detection of an audio adversarial attack, as a function of a result of a comparison between the dynamic time warping distance and a predetermined threshold.

机译：本发明涉及一种用于检测关于自动语音识别系统（ASR）处理的语音输入（VI）的音频对抗攻击的方法和设备。该方法包括：获取（11）与语音输入相关联的输入音频信号（IAS）；获取（12）由自动语音识别系统对输入音频信号进行处理而产生的转录本（T）；将（13）转录本（T）转换为合成音频信号（SAS）；以采样时间间隔分别从输入音频信号和合成音频信号中提取（15，15’）至少一个相同类型的声学特征，传递与输入音频信号相关联的第一特征向量序列（sFV1）和与合成音频信号相关联的第二特征向量序列（sFV2）；将（16，16’）第一序列特征向量的声学特征和第二序列特征向量的声学特征转换为与目标参考语音（RV）相关联的相应声学特征，分别传送第一序列转换特征向量（sCFV1）和第二序列转换特征向量（sCFV2）；计算（17）所述第一转换特征向量序列和所述第二转换特征向量序列之间的动态时间扭曲距离（D）；以及作为动态时间扭曲距离和预定阈值之间的比较结果的函数，传送（18）代表音频对抗性攻击的检测的数据段。

著录项

公开/公告号EP3989217A1

专利类型
公开/公告日2022-04-27

原文格式PDF
申请/专利权人 THOMSON LICENSING;
展开▼

申请/专利号EP20200203448
发明设计人 NADEAU PASCAL;GILBERTON PHILIPPE;GAUTIER ERIC;DELAUNAY CHRISTOPHE;
展开▼

申请日2020-10-22
分类号G10L15/08;G10L15/20;G10L13;G10L25/24;G10L25/51;
国家 EP
入库时间 2022-08-25 00:42:50

相似文献

专利
外文文献