首页> 外文会议>IEEE Workshop on Spoken Language Technology >Audio-visual speech activity detection in a two-speaker scenario incorporating depth information from a profile or frontal view

【24h】

Audio-visual speech activity detection in a two-speaker scenario incorporating depth information from a profile or frontal view

机译：在两个扬声器的场景中进行视听语音活动检测，其中包含来自侧面或正面的深度信息

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Motivated by increasing popularity of depth visual sensors, such as the Kinect device, we investigate the utility of depth information in audio-visual speech activity detection. A two-subject scenario is assumed, allowing to also consider speech overlap. Two sensory setups are employed, where depth video captures either a frontal or profile view of the subjects, and is subsequently combined with the corresponding planar video and audio streams. Further, multi-view fusion is regarded, using audio and planar video from a sensor at the complementary view setup. Support vector machines provide temporal speech activity classification for each visually detected subject, fusing the available modality streams. Classification results are further combined to yield speaker diarization. Experiments are reported on a suitable audio-visual corpus recorded by two Kinects. Results demonstrate the benefits of depth information, particularly in the frontal depth view setup, reducing speech activity detection and speaker diarization errors over systems that ignore it.

机译：由于深度视觉传感器（例如Kinect设备）的日益普及，我们研究了深度信息在视听语音活动检测中的实用性。假设有两个主题的情况，也可以考虑语音重叠。采用两种感官设置，其中深度视频捕获对象的正面或侧面视图，然后与相应的平面视频和音频流组合。此外，在互补视图设置中，使用来自传感器的音频和平面视频来考虑多视图融合。支持向量机通过融合可用的模态流，为每个视觉检测到的对象提供时间语音活动分类。分类结果被进一步组合以产生说话者二分法。据报道，有两个Kinects记录了合适的视听语料库。结果证明了深度信息的好处，特别是在正面深度视图设置中，与忽略该信息的系统相比，减少了语音活动检测和说话者歧义错误。

著录项

来源
《IEEE Workshop on Spoken Language Technology 》|2016年|579-584|共6页
会议地点
作者
Spyridon Thermos; Gerasimos Potamianos;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Visualization; Speech; Feature extraction; Streaming media; Support vector machines; Mouth; Sensors;

机译：可视化;语音;特征提取;流媒体;支持向量机;嘴巴;传感器;

相似文献

外文文献
中文文献
专利

1. A corpus of audio-visual Lombard speech with frontal and profile views [J] . Alghamdi Najwa, Maddock Steve, Marxer Ricard, The Journal of the Acoustical Society of America . 2018 ,第6aPta1期

机译：具有正面和配置文件视图的视听伦巴第语音的语料库
2. A corpus of audio-visual Lombard speech with frontal and profile views [J] . Alghamdi Najwa, Maddock Steve, Marxer Ricard, The Journal of the Acoustical Society of America . 2018 ,第6aPta2期

机译：具有正面和配置文件视图的视听伦巴第语音的语料库
3. Audio-visual speech processing in age-related hearing loss: Stronger integration and increased frontal lobe recruitment [J] . Rosemann Stephanie, Thiel Christiane M. NeuroImage . 2018 ,第期

机译：年龄相关听力损失的视听语音处理：更强的整合和额叶招募
4. Audio-visual speech activity detection in a two-speaker scenario incorporating depth information from a profile or frontal view [C] . Spyridon Thermos, Gerasimos Potamianos IEEE Workshop on Spoken Language Technology . 2016

机译：两个扬声器场景中的视听语音活动检测，其中包含来自配置文件或正视图的深度信息
5. Changing Configurations of Quantitative Electroencephalographic and s_LORETA Profiles during Remote Viewing over Time: Importance of Concurrent Geomagnetic Activity [D] . Scott, Mandy A. 2013

机译：随着时间的推移，在远程查看过程中变化的定量脑电图和s_LORETA配置文件的配置：并行地磁活动的重要性
6. Neural oscillations in the temporal pole for a temporally congruent audio-visual speech detection task [O] . Takefumi Ohki, Atsuko Gunji, Yuichi Takei, -1

机译：颞极视听语音检测任务在颞极的神经振荡
7. A corpus of audio-visual Lombard speech with frontal and profile views [O] . Najwa Alghamdi, Steve Maddock, Ricard Marxer, 2018

机译：具有正面和配置文件视图的视听伦巴第语音的语料库
8. Hydrogen Depth Profiling Using Elastic Recoil Detection [R] . Doyle, B. L., Peercy, P. S. 1979

机译：使用弹性反冲检测的氢深度分析

Audio-visual speech activity detection in a two-speaker scenario incorporating depth information from a profile or frontal view

摘要

著录项

相似文献

相关主题

期刊订阅