首页> 外文会议>WSEAS International Conference on Robotics, Control and Manufacturing Technology >Automatic Detection of Edited Parts in Inexact Transcribed Corpora Based on Alignment between Edited Transcription and Corresponding Utterance
【24h】

Automatic Detection of Edited Parts in Inexact Transcribed Corpora Based on Alignment between Edited Transcription and Corresponding Utterance

机译:基于编辑转录与相应话语的对准,自动检测不精确转录的语料库中的编辑零件

获取原文

摘要

The availability of a large-scale spontaneous speech corpora is crucially important for various domains of spoken language processing. However, the available corpora are usually limited because of its cost to prepare. On the other hand, inexact transcribed corpora have been widely produced in the form of shorthand notes, meeting records, or closed captions. Although these inexact transcribed corpora are more freely available than faithful/exact ones, these are not faithfully transcribed but contains edited transcriptions. Under this background, we are considering to build an efficient semi-automatic framework for converting inexact transcripts to faithful ones or exact transcriptions. This framework consists of two steps: the first step is to automatically detect positions of edited parts, and the second step is to manually transcribe the edited parts. This paper proposes an automatic detection method of edited parts in edited transcribed corpora for this framework. In our proposed method, an automatic alignment between edited transcription and its corresponding utterance is performed, and then a support vector machine based detector is applied to detect edited parts using some features obtained by the automatic alignment. As a result of evaluation on the Japanese National Diet Record, a reasonable result was obtained in speaker-closed condition.
机译:大规模的自发语音Corea的可用性对于语言处理的各种领域来说至关重要。但是,由于其成本准备,可用的Corpora通常有限。另一方面,不精确的转录的Corpora已被广泛生产的简写票据,会议记录或关闭标题的形式。虽然这些不精确转录的Corpora比忠诚/准确的对象更自由,但这些不是忠实的转录,但含有编辑的转录。在此背景下,我们正在考虑构建一个有效的半自动框架,用于将不精确的转录物转换为忠实的成绩单或精确的转录。此框架由两个步骤组成:第一步是自动检测编辑部分的位置,第二步是手动转录编辑的部件。本文提出了本框架编辑转录的Corpora中编辑零件的自动检测方法。在我们所提出的方法中,执行编辑转录与其对应的话语之间的自动对准,然后使用由自动对准获得的一些特征来施加基于支撑载体基机的检测器来检测编辑部分。由于日本国家饮食记录评估,在扬声器关闭状态下获得合理的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号