首页> 外文会议>IEEE India Council International Conference >Visual World to an Audible Experience: Visual Assistance for the Blind And Visually Impaired
【24h】

Visual World to an Audible Experience: Visual Assistance for the Blind And Visually Impaired

机译:视觉世界到达声音体验:对盲目和视力障碍的视觉援助

获取原文

摘要

This paper aims at assisting visually impaired people through Deep Learning (DL) by providing a system that can describe the surroundings as well as answer questions about the surroundings of the user. The system majorly consists of two models, an Image Captioning (IC) model, and a Visual Question Answering (VQA) model. The IC model is a Convolutional Neural Network and Recurrent Neural Network based architecture that incorporates a form of attention while captioning. This paper proposes two models, Multi-Layer Perceptron based and Long Short Term Memory (LSTM) based, for the VQA task that answer questions related to the input image. The IC model has achieved an average BLUE 1 score of 0.46. The LSTM based VQA model has given an overall accuracy of 47 percent. These two models are integrated along with Speech to Text and Text to Speech components to form a single system that works in real time.
机译:本文旨在通过提供一种可以描述周围环境的系统来协助视障人士(DL),以及关于用户周围环境的答案问题。系统主要由两个模型,图像标题(IC)模型和视觉问题应答(VQA)模型组成。 IC模型是一种卷积神经网络和经常性的神经网络的基于神经网络的架构,其在标题时包括一种注意力。本文提出了两个模型,基于多层的基于短期内存(LSTM),用于应答与输入图像相关的问题的VQA任务。 IC模型实现了平均蓝色1分0.46分。基于LSTM的VQA模型给出了47%的整体准确性。这两种模型与文本和文本的语音相结合到语音组件中,以形成实时工作的单个系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号