首页> 外文会议>International Conference on Language Resources and Evaluation >A Multimodal Educational Corpus of Oral Courses: Annotation, Analysis and Case Study
【24h】

A Multimodal Educational Corpus of Oral Courses: Annotation, Analysis and Case Study

机译:口腔课程的多峰教育语料:注释,分析和案例研究

获取原文

摘要

This paper presents a French multimodal educational dataset of oral courses. This corpus is part of the PASTEL (Performing Automated Speech Transcription for Enhancing Learning) project aiming to explore the potential of synchronous speech transcription and application in specific teaching situations (Bettenfeld et al., 2018; Bettenfeld et al., 2019). It includes 10 hours of different lectures, manually transcribed and segmented. The main interest of this corpus lies in its multimodal aspect: in addition to speech, the courses were filmed and the written presentation supports (slides) are made available. The dataset may then serve researches in multiple fields, from speech and language to image and video processing. The dataset will be freely available to the research community. In this paper, we first describe in details the annotation protocol, including a detailed analysis of the manually labeled data. Then, we propose some possible use cases of the corpus with baseline results. The use cases concern scientific fields from both speech and text processing, with language model adaptation, thematic segmentation and transcription to slide alignment.
机译:本文介绍了口腔课程的法国多式联播教育数据集。该语料库是粉彩的一部分(表演加强学习的自动化语音转录)项目,旨在探讨特定教学情况的同步语音转录和应用的潜力(Bettenfeld等,2018; Bettenfeld等,2019)。它包括10小时的不同讲座,手动转录和分割。此语料库的主要兴趣在于其多模式方面:除言语外,课程拍摄,书面呈现支持(幻灯片)可用。然后,数据集可以从语音和语言到图像和视频处理中的多个字段中的研究。数据集将自由地提供给研究界。在本文中,我们首先描述了注释协议的细节,包括对手动标记数据的详细分析。然后,我们提出了具有基线结果的一些可能使用案例。用例涉及语音和文本处理的科学领域,语言模型适应,主题分割和转录幻灯片对齐。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号