首页> 外文会议>International conference on computational linguistics >A Dataset for Building Code-Mixed Goal Oriented Conversation Systems
【24h】

A Dataset for Building Code-Mixed Goal Oriented Conversation Systems

机译:用于构建代码混合目标面向对话系统的数据集

获取原文

摘要

There is an increasing demand for goal-oriented conversation systems which can assist users in various day-to-day activities such as booking tickets, restaurant reservations, shopping, etc. Most of the existing datasets for building such conversation systems focus on monolingual conversations and there is hardly any work on multilingual and/or code-mixed conversations. Such datasets and systems thus do not cater to the multilingual regions of the world, such as India, where it is very common for people to speak more than one language and seamlessly switch between them resulting in code-mixed conversations. For example, a Hindi speaking user looking to book a restaurant would typically ask, "Kya turn is restaurant mein ek table book karne mein meri help karoge?" ('"Can you help me in booking a table at this restaurant?"). To facilitate the development of such code-mixed conversation models, we build a goal-oriented dialog dataset containing code-mixed conversations. Specifically, we take the text from the DSTC2 restaurant reservation dataset and create code-mixed versions of it in Hindi-English. Bengali-English, Gujarati-English and Tamil-English. We also establish initial baselines on this dataset using existing state of the art models. This dataset along with our baseline implementations is made publicly available for research purposes.
机译:对面向目标的对话系统的需求越来越大,可以帮助用户在各个日常活动,如预订机票,餐厅预订,购物等。大部分现有的数据集,用于建立这种对话系统的数据集专注于单格式对话和多语言和/或代码混合对话几乎没有任何工作。这样的数据集和系统因此不要迎合世界的多语言地区,例如印度,在那里人们说出多种语言是非常普遍的,并且在它们之间无缝切换,导致代码混合的对话。例如,一个想书的印度人,寻找预订餐馆的用户通常会问,“Kya Work是Restaurant Mein Ek桌书Karne Mein Meri帮助Karoge?” ('“你能帮我在这家餐厅预订一张桌子吗?”)。为了便于开发此类代码混合对话模型,我们构建了一个包含代码混合对话的面向目标的对话框数据集。具体来说,我们从DSTC2 Restaurant预留数据集中拍摄文本,并在印度英语中创建代码混合版本。孟加拉英语,古吉拉蒂 - 英语和泰米尔英语。我们还使用现有的艺术模型在此数据集上建立初始基准。此数据集以及我们的基线实现是公开可用于研究目的的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号