首页> 外文会议>Association for Computational Linguistics Annual Meeting >Using Conditional Random Fields For Sentence Boundary Detection In Speech
【24h】

Using Conditional Random Fields For Sentence Boundary Detection In Speech

机译:使用条件随机字段进行语音中句子边界检测

获取原文

摘要

Sentence boundary detection in speech is important for enriching speech recognition output, making it easier for humans to read and downstream modules to process. In previous work, we have developed hidden Markov model (HMM) and maximum entropy (Maxent) classifiers that integrate textual and prosodic knowledge sources for detecting sentence boundaries. In this paper, we evaluate the use of a conditional random field (CRF) for this task and relate results with this model to our prior work. We evaluate across two corpora (conversational telephone speech and broadcast news speech) on both human transcriptions and speech recognition output. In general, our CRF model yields a lower error rate than the HMM and Maxent models on the NIST sentence boundary detection task in speech, although it is interesting to note that the best results are achieved by three-way voting among the classifiers. This probably occurs because each model has different strengths and weaknesses for modeling the knowledge sources.
机译:语句中的句子边界检测对于丰富语音识别输出非常重要,使人类更容易读取和下游模块来处理。在以前的工作中,我们开发了隐藏的Markov模型(HMM)和最大熵(MaxEnt)分类器,可集成文本和韵律知识来源以检测句子边界。在本文中,我们评估了对此任务的条件随机字段(CRF)的使用,并将该模型与此模型相关联的结果。我们在人类转录和语音识别输出中评估了两种Corpora(会话电话语音和广播新闻语音)。通常,我们的CRF模型比在语音中的NIST句子边界检测任务上的HMM和MaxEnt模型产生较低的误差率,尽管有趣的是,请注意,通过分类器之间的三通投票来实现最佳结果。这可能发生,因为每个模型具有不同的优点和缺点,可以对知识来源进行建模。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号