RoboASR: A Dynamic Speech Recognition System for Service Robots

机译：Roboasr：服务机器人的动态语音识别系统

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper proposes a new method for building dynamic speech decoding graphs for state based spoken human-robot interaction (HRI). The current robotic speech recognition systems are based on either finite state grammar (FSG) or statistical N-gram models or a dual FSG and N-gram using a multi-pass decoding. The proposed method is based on merging both FSG and N-gram into a single decoding graph by converting the FSG rules into a weighted finite state acceptor (WFSA) then composing it with a large N-gram based weighted finite state transducer (WFST). This results in a tiny decoding graph that can be used in a single pass decoding. The proposed method is applied in our speech recognition system (RoboASR) for controlling service robots with limited resources. There are three advantages of the proposed approach. First, it takes the advantage of both FSG and N-gram decoders by composing both of them into a single tiny decoding graph. Second, it is robust, the resulting tiny decoding graph is highly accurate due to it fitness to the HRI state. Third, it has a fast response time in comparison to the current state of the art speech recognition systems. The proposed system has a large vocabulary containing 64K words with more than 69K entries. Experimental results show that the average response time is 0.05% of the utterance length and the average ratio between the true and false positives is 89% when tested on 15 interaction scenarios using live speech.

机译：本文提出了一种建立基于状态的语音解码图的新方法，用于基于状态的人机机器人交互（HRI）。目前的机器人语音识别系统基于有限状态语法（FSG）或统计N-GRAM模型或使用多通解码的双FSG和N-GRAM。该方法基于将FSG和N-GRAM合并到单个解码图中，通过将FSG规则转换为加权的有限状态接受（WFSA），然后用大n克基的加权有限状态换能器（WFST）来组合它。这导致一个微小的解码图，该图可以用于单个通过解码。所提出的方法应用于我们的语音识别系统（RoboASR），用于控制资源有限的服务机器人。拟议方法有三个优点。首先，通过将它们两个组成为单个微小的解码图来实现FSG和N-GRAM解码器的优势。其次，它是坚固的，所得到的微小解码图由于其适合于HRI状态而高度准确。第三，与现有技术语音识别系统的当前状态相比，它具有快速响应时间。建议的系统具有大型词汇，其中包含64k单词，其中包含超过69k。实验结果表明，当使用Live语音的15个交互情景测试时，平均响应时间为发声长度的0.05％，而真阳性之间的平均比率为89％。

著录项

来源
《International Conference on Social Robotics》|2012年||共11页
会议地点
作者
Abdelaziz A. Abdelhamid; Waleed H. Abdulla; Bruce A. MacDonald;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-532;
关键词
Human-robot interaction; automatic speech recognition; weighted finite state transducers;

机译：人体机器人互动;自动语音识别;加权有限状态换能器;

相似文献

外文文献
中文文献
专利

1. Feature vector classification based speech emotion recognition for service robots [J] . Jeong-Sik Park, Ji-Hwan Kim, Yung-Hwan Oh Consumer Electronics, IEEE Transactions on . 2009,第3期

机译：基于特征向量分类的服务机器人语音情感识别
2. Using Linguistic Anticipation to Improve the Quality of Speech Recognition in Robotic Systems [J] . S. A. Bobkov, D. S. Kurushin, A. M. Perevalov, Russian electrical engineering . 2020,第11期

机译：使用语言预期提高机器人系统中语音识别的质量
3. Smartphone-Based Online and Offline Speech Recognition System for ROS-Based Robots [J] . Zaman S., Slany W. Engineering Economics . 2014,第4期

机译：基于智能手机的基于ROS的在线和离线语音识别系统
4. RoboASR: A Dynamic Speech Recognition System for Service Robots [C] . Abdelaziz A. Abdelhamid, Waleed H. Abdulla, Bruce A. MacDonald International conference on social robotics . 2012

机译：RoboASR：用于服务机器人的动态语音识别系统
5. A multimodal fusion approach for automatic postal address recognition system using Optical Character Recognition (OCR) and Automatic Speech Recognition (ASR) techniques. [D] . Singh, Amriteshwar. 2011

机译：一种使用光学字符识别（OCR）和自动语音识别（ASR）技术的自动邮政地址识别系统的多模式融合方法。
6. From Birdsong to Human Speech Recognition: Bayesian Inference on a Hierarchy of Nonlinear Dynamical Systems [O] . Izzet B. Yildiz, Katharina von Kriegstein, Stefan J. Kiebel 2003

机译：从Birdsong到人类语音识别：非线性动力学系统层次结构的贝叶斯推断
7. A Robust Speech Recognition System for Service-Robotics Applications [O] . Masrur Doostdar, Stefan Schiffer, Gerhard Lakemeyer 2014

机译：一种用于服务机器人应用的鲁棒语音识别系统

RoboASR: A Dynamic Speech Recognition System for Service Robots

摘要

著录项

相似文献

相关主题

期刊订阅