首页> 外文期刊>Speech Communication >Extraction of pragmatic and semantic salience from spontaneous spoken English
【24h】

Extraction of pragmatic and semantic salience from spontaneous spoken English

机译:从自然英语口语中提取语用和语义显着性

获取原文
获取原文并翻译 | 示例
           

摘要

This paper computationalizes two linguistic concepts, contrast and focus, for the extraction of pragmatic and semantic salience from spontaneous speech. Contrast and focus have been widely investigated in modern linguistics, as categories that link intonation and information/discourse structure. This paper demonstrates the automatic tagging of contrast and focus for the purpose of robust spontaneous speech understanding in a tutorial dialogue system. In particular, we propose two new transcription tasks, and demonstrate automatic replication of human labels in both tasks. First, we define focus kernel to represent those words that contain novel information neither presupposed by the interlocutor nor contained in the precedent words of the utterance. We propose detecting the focus kernel based on a word dissimilarity measure, part-of-speech tagging, and prosodic measurements including duration, pitch, energy, and our proposed spectral balance cepstral coefficients. In order to measure the word dissimilarity, we test a linear combination of ontological and statistical dissimilarity measures previously published in the computational linguistics literature. Second, we propose identifying symmetric contrast, which consists of a set of words that are parallel or symmetric in linguistic structure but distinct or contrastive in meaning. The symmetric contrast identification is performed in a way similar to the focus kernel detection. The effectiveness of the proposed extraction of symmetric contrast and focus kernel has been tested on a Wizard-of-Oz corpus collected in the tutoring dialogue scenario. The corpus consists of 630 non-single word/phrase utterances, containing approximately 5700 words and 48 minutes of speech. The tests used speech waveforms together with manual orthographic transcriptions, and yielded an accuracy of 83.8% for focus kernel detection and 92.8% for symmetric contrast detection. Our tests also demonstrated that the spectral balance cepstral coefficients, the semantic dissimilarity measure, and part-of-speech played important roles in the symmetric contrast and focus kernel detections.
机译:本文计算了两种语言概念,即对比和焦点,用于从自发语音中提取语用和语义显着性。在现代语言学中,对比和焦点已被广泛研究,它们是将语调和信息/话语结构联系在一起的类别。本文演示了自动标记对比度和焦点的目的,目的是在教程对话系统中增强对自发语音的理解。特别是,我们提出了两个新的转录任务,并演示了这两个任务中人类标签的自动复制。首先,我们定义焦点内核来表示那些既不包含对话者预设也不包含在话语先例词中的新颖信息的词。我们建议基于单词相异性度量,词性标记和韵律度量(包括持续时间,音高,能量和我们提出的频谱平衡倒频谱系数)来检测焦点内核。为了衡量单词的相似性,我们测试了先前在计算语言学文献中发表的本体论和统计差异性度量的线性组合。其次,我们建议识别对称对比,它由一组在语言结构上平行或对称但含义不同或相反的词组成。对称对比度识别以类似于焦点内核检测的方式执行。提议的对称对比和焦点内核提取的有效性已在辅导对话场景中收集的“绿野仙踪”语料库上进行了测试。语料库由630个非单字/短语发声组成,包含约5700个字和48分钟的语音。该测试将语音波形与手动正交拼写一起使用,对焦点内核检测的准确性为83.8%,对于对称对比检测的准确性为92.8%。我们的测试还证明,频谱平衡倒谱系数,语义差异度量和词性在对称对比度和焦点核检测中起着重要作用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号