Multimodal Speech Recognition: Lip-reading Automatons

Marion De Boo

首页> 外文期刊>Delft outlook >Multimodal Speech Recognition: Lip-reading Automatons

【24h】

Multimodal Speech Recognition: Lip-reading Automatons

机译：多模式语音识别：朗读自动机

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Just that you are standing in the concourse of Rotterdam Central Station, and you can speak into a machine to ask it the time of the next train to Amsterdam, and an electronic voice will instantly tell you the answer, including the platform number. The TU Delft Mediamatics department has been collaboring for some years with OVR (Openbaar Vervoer Reisinformatie), a company that provides public transport information, to create systems for automatic speech recognition. So far, results have been nothing to write home about, certainly not when the information was requested from noisy places like station platforms. If the voice of the passenger on the platform is drowned in ambient noise, with its mixture of announcements including delayed trains, the computer gets confused. It is an established fact that other people are much easier to understand if you can see as well as hear them talk. It is not just the deaf who use lip-reading, for people with normal hearing will also resort to watching the speaker's mouth as the level of ambient noise increases. This has led to the idea of supporting automated speech recognition systems with software for automatic lip-reading. The system could also come in useful for hands-free phone calls in cars. A small camera could be pointed at the mouth of the speaker and a processor could analyse the video images in real time. Polish IT engineer Jacek Wojdel has developed a working prototype. Automatic speech recognition has been the focus of worldwide interest for over two decades. International companies have large research departments working on it. At Philips in Aachen, Germany alone some 150 researchers are active in the field. IBM has developed the Via Voice Speech System, and the Belgium company of Lernout & Hauspie, which recently went bankrupt, was also a major player.

机译：只是您正站在鹿特丹中央火车站的大厅里，并且您可以对着机器讲话，问它下一趟到达阿姆斯特丹的火车的时间，并且电子声音会立即告诉您答案，包括站台号。 TU Delft Mediamatics部门已经与提供公共交通信息的公司OVR（Openbaar Vervoer Reisinformatie）合作了多年，以创建用于自动语音识别的系统。到目前为止，没有什么可写的结果了，当然不是在从嘈杂的地方（如车站月台）索要信息的时候。如果平台上乘客的声音被周围的噪音所淹没，同时有各种通知，包括火车延误，计算机就会感到困惑。一个既定的事实是，如果您既能看到又能听到别人讲话，那么其他人会更容易理解。朗读的不仅是聋哑人，因为听力正常的人也会随着周围噪音水平的提高而观看说话者的嘴巴。这导致了用自动唇读软件来支持自动语音识别系统的想法。该系统还可用于汽车免提通话。小型摄像头可以对准扬声器的嘴，处理器可以实时分析视频图像。波兰IT工程师Jacek Wojdel开发了一个可运行的原型。二十多年来，自动语音识别一直是全球关注的焦点。国际公司拥有大型研究部门。仅在德国亚琛的飞利浦，就有约150名研究人员活跃于该领域。 IBM开发了Via Voice Speech System，最近破产的比利时Lernout＆Hauspie公司也是主要参与者。

著录项

来源
《Delft outlook》 |2002年第4期|p.14-18|共5页
作者
Marion De Boo;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自然科学总论;
关键词

相似文献

外文文献
中文文献
专利

1. Improvement of multimodal gesture and speech recognition performance using time intervals between gestures and accompanying speech [J] . Madoka Miki, Norihide Kitaoka, Chiyomi Miyajima, EURASIP journal on audio, speech, and music processing . 2014,第1期

机译：利用手势和伴随语音之间的时间间隔改善多模式手势和语音识别性能
2. An analysis of the effect of combining standard and alternate sensor signals on recognition of syllabic units for multimodal speech recognition [J] . Radha N., Shahina A., Prabha P., Pattern recognition letters . 2018,第NOVa1期

机译：分析标准和替代传感器信号组合对音节单元识别以进行多模式语音识别的影响
3. Multimodal systems for speech recognition [J] . Orken Zh. Mamyrbayev, Keylan Alimhan, Beibut Amirgaliyev, International Journal of Mobile Communications . 2020,第3期

机译：语音识别的多模式系统
4. Speech Training System for Hearing Impaired Individuals Based on Automatic Lip-Reading Recognition [C] . Yuanyao Lu, Shenyao Yang, Zheng Xu, International Conference on Human Factors and Systems Interaction;International Conference on Human Factors and Systems Interaction . 2020

机译：基于自动唇读识别的人物听力障碍的语音培训系统
5. A multimodal fusion approach for automatic postal address recognition system using Optical Character Recognition (OCR) and Automatic Speech Recognition (ASR) techniques. [D] . Singh, Amriteshwar. 2011

机译：一种使用光学字符识别（OCR）和自动语音识别（ASR）技术的自动邮政地址识别系统的多模式融合方法。
6. Lip-Reading Enables the Brain to Synthesize Auditory Features of Unknown Silent Speech [O] . Mathieu Bourguignon, Martijn Baart, Efthymia C. Kapnoula, 2020

机译：唇读使大脑能够合成未知无声语音的听觉特征
7. Towards estimating the upper bound of visual-speech recognition: the visual lip-reading feasibility databas [O] . Fernandez-Lopez, Adriana, Martinez, Oriol, Sukno, Federico M. 2017

机译：为了估计视觉语音识别的上限：视觉唇读可行性数据库

Multimodal Speech Recognition: Lip-reading Automatons

摘要

著录项

相似文献

相关主题

期刊订阅