首页> 外文会议>International conference on social robotics >RoboASR: A Dynamic Speech Recognition System for Service Robots
【24h】

RoboASR: A Dynamic Speech Recognition System for Service Robots

机译:RoboASR:用于服务机器人的动态语音识别系统

获取原文

摘要

This paper proposes a new method for building dynamic speech decoding graphs for state based spoken human-robot interaction (HRI). The current robotic speech recognition systems are based on either finite state grammar (FSG) or statistical N-gram models or a dual FSG and N-gram using a multi-pass decoding. The proposed method is based on merging both FSG and N-gram into a single decoding graph by converting the FSG rules into a weighted finite state acceptor (WFSA) then composing it with a large N-gram based weighted finite state transducer (WFST). This results in a tiny decoding graph that can be used in a single pass decoding. The proposed method is applied in our speech recognition system (RoboASR) for controlling service robots with limited resources. There are three advantages of the proposed approach. First, it takes the advantage of both FSG and N-gram decoders by composing both of them into a single tiny decoding graph. Second, it is robust, the resulting tiny decoding graph is highly accurate due to it fitness to the HRI state. Third, it has a fast response time in comparison to the current state of the art speech recognition systems. The proposed system has a large vocabulary containing 64K words with more than 69K entries. Experimental results show that the average response time is 0.05% of the utterance length and the average ratio between the true and false positives is 89% when tested on 15 interaction scenarios using live speech.
机译:本文提出了一种新的方法,用于建立基于状态的人机交互(HRI)的动态语音解码图。当前的机器人语音识别系统基于有限状态语法(FSG)或统计N-gram模型或使用多遍解码的双重FSG和N-gram。所提出的方法是通过将FSG规则转换为加权有限状态接收器(WFSA),然后将其与基于Ngram的大型加权有限状态传感器(WFST)组成,将FSG和N-gram合并为单个解码图。这导致可以在单遍解码中使用的微小解码图。所提出的方法被应用于我们的语音识别系统(RoboASR)中,用于控制资源有限的服务机器人。所提出的方法具有三个优点。首先,它通过将FSG和N-gram解码器组合成一个微小的解码图来发挥其优势。其次,它很健壮,由于它适合HRI状态,因此生成的微小解码图非常准确。第三,与当前最先进的语音识别系统相比,它具有快速的响应时间。拟议的系统有一个庞大的词汇表,包含64K个单词以及超过69K个条目。实验结果表明,在15种使用现场语音的交互场景中进行测试时,平均响应时间为发声长度的0.05%,真假正误的平均比率为89%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号