首页> 外文会议>Information Intelligence and Systems, 1999. Proceedings. 1999 International Conference on >Towards building a better language model for SWITCHBOARD: the POStagging task
【24h】

Towards building a better language model for SWITCHBOARD: the POStagging task

机译:致力于为SWITCHBOARD建立更好的语言模型:POS标记任务

获取原文

摘要

Language models are used extensively in state-of-the-art speechrecognition systems to help determine the probability of a hypothesizedword sequence. These probabilities, along with the acoustic modelscores, allow the system to constrain the search space duringrecognition to only those word sequences that have a reasonable chanceof being correct. In order to determine these probabilities, knowledgeof the entire problem space is necessary. However, in speechrecognition, this is an unreasonable if not impossible task, especiallywhen one is using the SWITCHBOARD corpus (a large corpus consisting ofover 240 hours of recorded telephone conversations totaling almost 3million words of text). Many statistical and rule-based approaches havebeen applied to this problem in order to arrive at a language model thatproduces the minimal word error rate (WER) of the recognizer. Onetechnique includes part-of-speech (POS) information in the languagemodel. This paper discusses the task of tagging the SWITCHBOARD corpuswith POS information in the usual manner, and the problems encounteredwhen trying to conform conversational speech to these tags
机译:语言模型在最先进的演讲中广泛使用 识别系统,以帮助确定假设的概率 单词序列。这些概率以及声学模型 分数,允许系统在此期间约束搜索空间 仅对那些具有合理机会的词序列的识别 是正确的。为了确定这些概率,知识 整个问题空间是必要的。但是,在演讲中 认识,如果不是不可能的任务,这是一个不合理的任务,特别是 当一个人使用交换机语料库时(由一个大型毒品组成 超过240小时的录制电话谈话差不多3 百万字的文字)。许多统计和规则的方法都有 已应用于此问题,以便到达语言模型 生成识别器的最小字错误率(WER)。一 技术包括语言中的语音部分(POS)信息 模型。本文讨论了标记交换机语料库的任务 用常规方式进行POS信息,遇到的问题 尝试将对话语音符合这些标签时

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号