首页> 外文会议>International Workshop Mobile Social Signal Processing >Speaker Diarization of Multi-party Conversations Using Participants Role Information: Political Debates and Professional Meetings
【24h】

Speaker Diarization of Multi-party Conversations Using Participants Role Information: Political Debates and Professional Meetings

机译:使用参与者的多方对话的发言者日益改估职称:政治辩论和专业会议

获取原文

摘要

Speaker Diarization aims at inferring who spoke when in an audio stream and involves two simultaneous unsupervised tasks: (1) the estimation of the number of speakers, and (2) the association of speech segments to each speaker. Most of the recent efforts in the domain have addressed the problem using machine learning techniques or statistical methods (for a review see [11]) ignoring the fact that the data consists of instances of human conversations. When humans want to use language to communicate orally with each other, they are faced to a coordination problem. "Avoidance of collision is one obvious ground for this coordination of actions between the participants. In order to coordinate efficiently and successfully, they will therefore have to agree to follow certain rules of interaction" [8]. One such rule is that no one monopolizes the floor but the participants take turns to speak. This concept is called turn-taking. The computational linguistic literature is rich on the analysis of human conversations; the seminal work of [9] shows that conversations obey to predictable interactions pattern between participants and a speaker turn is related in predictable ways to the previous and next turn and follows a structure similar to a grammar. In between the social phenomena that regulates the turns in a conversation, lot of attention has been devoted to roles. In fact people interact in different ways depending on the context of the environment but "Their interactions involve behaviors associated with defined statuses and particular roles. These statuses and roles help to pattern our social interactions and provide pre-dictability" [10].
机译:扬声器深度旨在推断在音频流时讲话的推断,涉及两个同步无监督的任务:(1)扬声器数量的估计,以及(2)语音段与每个扬声器的关联。域中最近的大多数努力都使用机器学习技术或统计方法解决了问题(审查见[11])忽略数据由人类对话的情况组成的事实。当人类想要使用语言彼此口头通信时,它们就会面临协调问题。 “避免碰撞是参与者之间的行动协调的一个明显的理由。为了有效地协调,因此他们必须同意遵循某些互动规则”[8]。一个这样的统治是,没有人垄断地板,但参与者轮流说话。这个概念被称为转弯。计算语言文学富于人类谈话的分析; [9]的开创性工作表明,在参与者和扬声器转弯之间遵守可预测的相互作用模式的对话以可预测的方式与前一个和下一个转弯的方式相关,并遵循类似于语法的结构。在调节谈话中的转弯的社会现象之间,很多关注都致力于角色。事实上,人们根据环境的上下文,“他们的交互涉及与定义的状态和特定角色相关的行为。这些状态和角色有助于模式我们的社交互动并提供预测性并提供预测性”[10]。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号