首页> 外文会议>International conference on language resources and evaluation >Extending the MPC corpus to Chinese and Urdu - A Multiparty Multi-Lingual Chat Corpus for Modeling Social Phenomena in Language
【24h】

Extending the MPC corpus to Chinese and Urdu - A Multiparty Multi-Lingual Chat Corpus for Modeling Social Phenomena in Language

机译:将MPC语料库扩展到中文和乌尔都语-用于建模语言中的社会现象的多方多语言聊天语料库

获取原文

摘要

In this paper, we report our efforts in building a multi-lingual multi-party online chat corpus (MMPC) in order to develop a firm understanding in a set of social constructs such as agenda control, influence, and leadership as well as to computationally model such constructs in online interactions. These automated models will help capture the dialogue dynamics that are essential for developing, among others, realistic human-machine dialogue systems, including autonomous virtual chat agents. In this paper, we first introduce our experiment design and data collection method in Chinese and Urdu, and then report on the current stage of our data collection. We annotated the collected corpus on four levels: communication links, dialogue acts, local topics, and meso-topics. Results from the analyses of annotated data on different languages indicate some interesting phenomena, which are reported in this paper.
机译:在本文中,我们报告了我们在建立多语言多方在线聊天语料库(MMPC)方面所做的工作,以便在一系列社交结构(如议程控制,影响力和领导力以及对计算的理解)中建立牢固的理解。在在线互动中对此类构造进行建模。这些自动化模型将有助于捕获对话动态,这对于开发现实的人机对话系统(包括自治的虚拟聊天代理)至关重要。在本文中,我们首先介绍了中文和乌尔都语的实验设计和数据收集方法,然后报告了数据收集的当前阶段。我们在四个级别上注释了收集的语料库:交流链接,对话行为,本地主题和中观主题。对不同语言的带注释数据的分析结果表明了一些有趣的现象,本文对此进行了报道。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号