首页> 外文会议>International Conference on Speech and Computer >Pragmatic Markers Distribution in Russian Everyday Speech: Frequency Lists and Other Statistics for Discourse Modeling
【24h】

Pragmatic Markers Distribution in Russian Everyday Speech: Frequency Lists and Other Statistics for Discourse Modeling

机译:俄语日常语音中的语用标记分布:语篇建模的频率列表和其他统计数据

获取原文

摘要

Pragmatic markers (PMs) are discourse units (words and multiword expressions) with a weakened referential meaning, which perform a variety of pragmatic tasks. For example, in English the common PMs are 'well', 'you know', 'I think', and many others. PMs are integral elements of spoken discourse in every language. According to the results obtained from the ORD corpus of everyday Russian, their share can reach up to 6% of the total number of words in speech of individual speakers. More than that, in some speech fragments, PMs may even exceed the share of significant units (i.e., standard words). However, despite their frequency and usualness, PMs are still poorly understood. Current NLP and discourse modeling systems lack information on PMs distribution and usage, this fact leads to noticeable shortcomings in work of these systems when they face spontaneous speech of everyday spoken discourse. In this paper we present top frequency lists of PMs for Russian dialogue and monologue spoken speech in general, and also for separate sociological groups of informants (by gender and by age). Our current list of PMs for Russian contains 450 units which arc the variants of 50 main structural types. Besides, we consider the most frequent functions of PMs in spoken Russian. The presented quantitative data may be used for improvement of NPL and discourse modeling systems.
机译:语用标记(PM)是指语单元(单词和多词表达),其指涉性弱,它们执行多种语用任务。例如,在英语中,常见的PM是“好”,“您知道”,“我认为”等。 PM是每种语言口语中不可或缺的元素。根据从日常俄语的ORD语料库获得的结果来看,它们所占的比例可以达到单个说话者的语音总词数的6%。不仅如此,在某些语音片段中,PM甚至可能超过有效单位(即标准单词)的份额。然而,尽管它们的频率和惯常性,但人们对它们的了解仍然很少。当前的NLP和话语建模系统缺乏有关PM分配和使用的信息,这一事实导致这些系统在面对日常话语的自发讲话时会出现明显的缺陷。在本文中,我们列出了俄罗斯对话和独白口头讲话的最高频率列表,也列出了信息提供者的不同社会学类别(按性别和年龄)。我们当前的俄语PM列表包含450个单位,这些单位是50种主要结构类型的变体。此外,我们认为俄语中PM的最常见功能。提出的定量数据可用于改进不良贷款和话语建模系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号