首页> 外文期刊>IEEE robotics and automation letters >COSM2IC: Optimizing Real-Time Multi-Modal Instruction Comprehension
【24h】

COSM2IC: Optimizing Real-Time Multi-Modal Instruction Comprehension

机译:COSM2IC: Optimizing Real-Time Multi-Modal Instruction Comprehension

获取原文
获取原文并翻译 | 示例
       

摘要

Supporting real-time, on-device execution of multi-modal referring instruction comprehension models is an important challenge to be tackled in embodied Human-Robot Interaction. However, state-of-the-art deep learning models are resource-intensive and unsuitable for real-time execution on embedded devices. While model compression can achieve a reduction in computational resources up to a certain point, further optimizations result in a severe drop in accuracy. To minimize this loss in accuracy, we propose the COSM2IC framework, with a lightweight Task Complexity Predictor, that uses multiple sensor inputs to assess the instructional complexity and thereby dynamically switch between a set of models of varying computational intensity such that computationally less demanding models are invoked whenever possible. To demonstrate the benefits of COSM2IC, we utilize a representative human-robot collaborative “table-top target acquisition” task, to curate a new multi-modal instruction dataset where a human issues instructions in a natural manner using a combination of visual, verbal, and gestural (pointing) cues. We show that COSM2IC achieves a 3-fold reduction in comprehension latency when compared to a baseline DNN model while suffering an accuracy loss of only $sim$5. When compared to state-of-the-art model compression methods, COSM2IC is able to achieve a further 30 reduction in latency and energy consumption for a comparable performance.

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号