首页> 外文会议>International Conference on Advanced Robotics and Mechatronics >An intelligent meeting recording system for BoBi secretary robot
【24h】

An intelligent meeting recording system for BoBi secretary robot

机译:Bobi秘书机器人智能会议录音系统

获取原文
获取外文期刊封面目录资料

摘要

This paper presents an intelligent meeting recording system for an intelligent personal robot named BoBi secretary. Participants register their voices to the system before meeting. In meeting, a Microphone records voice while applying another process to recognize the voice at the same time to transform it into text resulting in an online meeting recording. The speech of the meeting recording system includes voices of multiple speakers, and it is necessary to divide it into parts of speakers before recognizing them and applying speech recognition. We attempt two methods to recognize the meeting voice: segmentation-based method and segmentation-free method. The segmentation-based method applies a Voice Activity Detection (VAD) algorithm to detect the speech and the non-speech frames to segment the recorded voice into speaker parts. Then it applies Gaussian Mixture Models (GMMs) to Mel-Frequency Cepstral Coefficient (MFCC) features extracted from each part to recognize which speaker it is from. The segmentation-free method evaluates the voice frames using MFCC features and GMMs, and then searches the optimal path as the segmentation result into the voice frames lattice. Finally each speaker's voice is sent to a speech recognition server to obtain a result text. We make a comparison between meeting recording system using segmentation-free method and that using segmentation-based method, and show the result that the segmentation-based method brings better result.
机译:本文介绍了一个名为Bobi秘书的智能个人机器人的智能会议录音系统。参与者在会议之前将其声音注册到系统。在会议中,麦克风在应用另一个进程的同时记录语音,同时将其转换为文本,从而导致在线会议录制。会议记录系统的演讲包括多个扬声器的声音,并且必须在识别它们之前将其分为扬声器的部分,并应用语音识别。我们尝试两种方法来识别会议语音:基于分段的方法和分割方法。基于分割的方法应用语音活动检测(VAD)算法来检测语音和非语音帧,以将记录的语音分段为扬声器部件。然后,它将高斯混合模型(GMMS)应用于从每个部分提取的熔体频率谱系码(MFCC)特征,以识别它来自哪个扬声器。分割方法使用MFCC功能和GMMS评估语音帧,然后将最佳路径搜索,因为分段结果进入语音帧格子。最后,每个扬声器的语音都被发送到语音识别服务器以获取结果文本。我们使用分割方法进行会议录制系统与基于分段的方法进行比较,并显示基于分段的方法带来更好结果的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号