首页> 外文期刊>Computer speech and language >Reversible speaker de-identification using pre-trained transformation functions
【24h】

Reversible speaker de-identification using pre-trained transformation functions

机译:使用预训练的变换功能可逆的说话人去识别

获取原文
获取原文并翻译 | 示例
           

摘要

Speaker de-identification approaches must accomplish three main goals: universality, naturalness and reversibility. The main drawback of the traditional approach to speaker de-identification using voice conversion techniques is its lack of universality, since a parallel corpus between the input and target speakers is necessary to train the conversion parameters. It is possible to make use of a synthetic target to overcome this issue, but this harms the naturalness of the resulting de-identified speech. Hence, a technique is proposed in this paper in which a pool of pre-trained transformations between a set of speakers is used as follows: given a new user to de-identify, its most similar speaker in this set of speakers is chosen as the source speaker, and the speaker that is the most dissimilar to the source speaker is chosen as the target speaker. Speaker similarity is measured using the i-vector paradigm, which is usually employed as an objective measure of speaker de-identification performance, leading to a system with high de-identification accuracy. The transformation method is based on frequency warping and amplitude scaling, in order to obtain natural sounding speech while masking the identity of the speaker. In addition, compared to other voice conversion approaches, the proposed method is easily reversible. Experiments were conducted on Albayzin database, and performance was evaluated in terms of objective and subjective measures. These results showed a high success when de-identifying speech, as well as a great naturalness of the transformed voices. In addition, when making the transformation parameters available to a trusted holder, it is possible to invert the de-identification procedure, hence recovering the original speaker identity. The computational cost of the proposed approach is small, making it possible to produce de-identified speech in real-time with a high level of naturalness.
机译:说话人去识别方法必须实现三个主要目标:普遍性,自然性和可逆性。传统的使用语音转换技术进行说话人识别的方法的主要缺点是缺乏通用性,因为输入和目标说话人之间必须有一个平行语料库来训练转换参数。可以使用合成目标来克服此问题,但是这会损害所得到的身份不明语音的自然性。因此,本文提出了一种技术,其中,一组说话者之间的一组预训练变换按如下方式使用:给一个新用户去识别,将其在这组说话者中最相似的说话者选为源扬声器,并且选择与源扬声器最不相似的扬声器作为目标扬声器。说话人相似度是使用i-vector范式测量的,通常将其用作说话人去识别性能的客观度量,从而导致系统具有很高的去识别精度。该变换方法基于频率扭曲和幅度缩放,以便在掩盖说话者身份的同时获得自然的语音提示。另外,与其他语音转换方法相比,该方法易于逆转。在Albayzin数据库上进行了实验,并根据客观和主观措施对性能进行了评估。这些结果表明,在取消识别语音时非常成功,而且转换后的声音具有很大的自然性。另外,当将转换参数提供给受信任的持有者时,可以反转去识别过程,从而恢复原始说话者身份。所提出的方法的计算成本很小,使得可以以高自然度实时地产生去识别语音。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号